Terminology and formulae for AS stats Pearson Edexcel textbook CH3.
Common definition for any outlier
Any value that is:
greater than Q3 +k(Q3-Q1)
less than Q1 - k(Q3-Q1)
k not always used
What is ‘cleaning data’?
The process of removing anomalies from data. Must justify why values are being removed.
Why (when to) use a histogram to represent data?
when the data is grouped, continuous data.
Frequency density equation
frequency density = frequency/class width
How do you form a frequency polygon?
joining the middle of the top of each bar in a histogram with equal class widths
What two things do you comment on when comparing data?
Measure of spread, measure of location.
Which two pairs of measure of spread and location can you comment on when comparing data?
mean and standard deviation
OR
median and interquartile range
Which pair for comparison is more suitable for a set of data with extreme values?
Median and interquartile range
Adv of box plot?
It helps us to see the spread of the data easily.
The plot is clear and easy to understand.
It uses the range and the median values.
It is easy to compare the stratified data.
Disadv of box plot?
Original data is not clearly shown in the box plot.
Mean and mode cannot be identified using the box plot.
It can be easily misinterpreted.
If large outliers are present, the box plot is more likely to give an incorrect representation.
Which pair of measure of location/spread to use for box plot comparison?
median and interquartile range