1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is an average
A value that represents the centre of a set of data. Includes mode, median and mean
What is the modal class?
The class interval with the highest frequency (the frequency value is not the mode but the column/row next to it)
What if the median position is a decimal (e.g. 7.5)?
Find the 7th and 8th values and divide their sum by 2
If data is in a frequency table (discrete) - median
add frequency (CF)
then, sum of frequency + 1 / 2
Find the first CF that is equal to or greater than the number you got
If data is in a frequency table (grouped) - median
find total frequency
find n/2
find the median class
do Median CF - Lower boundary CF e.g 7th - 4th = 3
divide 3 by frequency of median class e.g 3/5
then do HB - LB e.g 170 - 160 = 10
then do 3/5 × 10
+ the lower boundary e.g 3/5 × 10 + 160 = 166cm
Finding mean of frequency table (not grouped)
make third collumn for Value (x) x frequency (f)
find sum of fx
find sum of f
mean = sum fx/sum f
Finding mean of frequency table (grouped)
sum freq x midpoint/sum freq
find weighted mean
A student’s final mark is made of:
Coursework = 25%
Exam = 75%
Marks:
Coursework = 60
Exam = 80
Step-by-step
Multiply each mark by its weight
Coursework: 60 × 0.25 = 15
Exam: 80 × 0.75 = 60
Add the results
15 + 60 = 75
weighted mean = 75
find weighted mean for freq table
sum mark x weight/sum weight
what is weighted mean?
A weighted mean is just an average where some values count more than others.
In a normal mean, everything is equally important.
In a weighted mean, some things are more important, so they get more weight.
geometric mean
multiply number together then take the nth (how many numbers there are in total) root e.g
geometric mean of 1, 2 and 32
1 × 2 × 32 = 64
cube root (bc there are 3 values) 64 = 4
advantages of mean
it uses all the data
Useful for further calculations
disadvantaged of mean
May not be a data value
Always affected by extreme values or outlier.
advantages of median
Not affected by extreme values (outliers)
Represents the middle value
disadvantages of median
May not be data value
Not always representative of the data.
advantages of mode
only average for qualitative data
easy to identify
disadvantages of mode
There may not be a mode or may be
more than one mode.
Cannot be used to calculate measures
of spread
transforming data
For large data values you may want to make the numbers smaller so that it saves you time since it’s also easier to make mistakes with bigger numbers.
You can find the mean by taking away the same large number from all the values. e.g For values 1.04, 1.09, 1.03, 1.12 you might want to subtract 1 from all the values first and then multiply by 100 to make them whole numbers.
How Changes to your Data affect the Averages - mode
could change only if the new value changes which value appears the most.
Could also make the data bimodal if there are now two values that appear the same amount.
How Changes to your Data affect the Averages - median
If you add a value that is greater than the median, the median might increase.
If you add a value that is smaller than the median, the median might decrease.
If you remove a value that is greater than the median, the median might decrease.
If you remove a value that is smaller than the median, the median might increase.
If you add/remove one value that is greater and one that is smaller than the median, the median stays the same.
How Changes to your Data affect the Averages - median
If you add a value that is greater than the mean, the mean increases.
If you take away a value that is less than the mean, the mean increases.
If you add a value that is less than the mean, the mean decreases.
If you take away a value that is greater than the mean, the mean decreases.
If you replace a value in your data with another number that is greater/smaller than the original, the mean will also change.
what is IQR
IQR measures how spread out the middle 50% of data is e.g LQ = 25 percent, UQ = 75 percent
calc for IQR
IQR = UQ - LQ
calc for LQ (discrete)
¼ of the way through
1/4(n+1)
(n = total number of data values)
calc for UQ (discrete)
¾ of the way through
¾(n+1)
(n = total number of data values)
what is range
How spread out the data is
range from tables
For data from tables the largest value is the biggest number from the first column and the smallest value is the first number from the first column
IQR - grouped data
Grouped Data LQ = ¼ nth value UQ = ¾ nth value
1. Draw your CF curve.
2. Use the above formulae to find the positions for LQ (25%) and UQ (75%).
3. Draw lines from the 25% and 75% marks on the y-axis. The corresponding x-axis values give you your LQ and UQ values.
4. IQR = UQ – LQ
standard deviation
standard Deviation tells us how spread out the data is around the mean.
It measures average distance of each value from the mean.
calc for SD (discrete data)
1. Calculate the mean.
2. Subtract the mean from each data value and square the answer – it might be useful to do this in a table.
3. Add up all the answers to step 2.
4. Divide by the number of values.
5. Square root.
low SD
values are close to the mean (data is clustered)
high SD
values are far from the mean (data is more spread out)
calc for SD (Frequency Table - not grouped)
1. Calculate the mean.
2. Create a new column for 𝒙 − 𝒙̅. Subtract mean from each value in the first column.
3. Square each answer to step 2 – create new column.
4. Multiply each answer in step 3 by frequency – create new column.
5. Add answers to step 4 – add the last column.
6. Divide answer to step 5 by total of frequency column.
7. Square root.
(x = each indivdual height or something else)
calc for SD (Frequency Table - grouped)
For grouped frequency tables, follow the same step as for frequency table but use the midpoint for x. You may need to create an extra column to your table for the midpoint before carrying out the above steps.
boxplots
Box Plots Divide the data into sections that each contain approximately 25% of the data in that set. Represents important features of the data and gives a summary of the spread/skew of the data.
The total length of the box plot represents the range. The box represents the middle 50% and the IQR.
Box Plots include 5 pieces of information about the data
1. Minimum Value – the lowest score, shown at the far left of the diagram
2. Lower Quartile (LQ) – 25% of data is below this
3. Median – Mark the middle of the data – 50% of the data is above/below this value
4. Upper Quartile (UQ) – 25% of data is above this value/75% of data is below it.
5. Maximum Value – The highest score, shown at the far right of the diagram
outliers
Including outliers may misrepresent your data but not including them could falsify your data. They distort the data so you need to identify them.
how to find outliers
Outliers are more than 1.5 X IQR above UQ or below LQ.
𝑶𝒖𝒕𝒍𝒊𝒆𝒓𝒔 𝒂𝒓𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 > 𝑼𝑸 + (𝟏. 𝟓 × 𝑰𝑸𝑹) 𝒐𝒓 < 𝑳𝑸 − (𝟏. 𝟓 × 𝑰𝑸𝑹)
1. Work out IQR
2. Find 1.5 x IQR
3. Subtract this value from LQ and add to UQ.
4. These values are now your new min/max points for your box plot. Any values in your data outside of this range are outliers.
5. Mark outliers with an X on your box plot.
Outliers can also be found using the mean and standard deviation
they are values more than 3 SD away from the mean.
𝑶𝒖𝒕𝒍𝒊𝒆𝒓𝒔 = 𝑽𝒂𝒍𝒖𝒆𝒔 𝒐𝒖𝒕𝒔𝒊𝒅𝒆 𝒙̅ ± 𝟑SD
Interpreting box plots
median for measure of average - higher median = taller, bigger etc
range or IQR for measure of spread
Compare skewness of both box plots
skew
Describes the shape of the distribution and tells you how the data is spread out. If the data is skewed, it means most of the values are more on one side of the median.
positive skew
the data above the median is more spread out
mean > median > mode
symmetrical skew
the data is evenly spread out above and below the median
mean = median = mode
negative skew
the data below the median is more spread out
Skewness using the Formula
Formula: 𝑺𝒌𝒆𝒘𝒏𝒆𝒔𝒔 = 𝟑(𝒎𝒆𝒂𝒏−𝒎𝒆𝒅𝒊𝒂𝒏)/𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
Positive Value = positive skew. The larger the value, the larger the skew.
Negative Value = Negative Skew. The smaller the value, the stronger the skew.
Value of 0 = No Skew/Symmetrical.
comparing data sets
Compare using a measure of average (mean/median/mode) and spread (range/IQR/SD) or skewness.
Always make reference to individual values and mention which data set is larger/smaller than the other clearly.
Always interpret in context – link back to the scenario in the question and labels on axes
Comparing Averages
Mean/median/mode for data set A is larger than mean/median/mode data set B so on average data set A is more … than data set B.
Comparing spread
Range/IQR/SD for data set A is larger than that of data set B so the ‘results’ of data set A are more spread out/less consistent than those of data set B.
Data A has a smaller range/IQR/SD than data set B which means the ‘results’ for ‘data set A’ are more consistent.
Remember lower SD means values are closer to the mean and therefore similar.
Comparing Skew
Box Plot for data set A is positively skewed so majority of ‘results’ were low with few higher ‘results’.
Box plot for data set A is negatively skewed so majority of ‘results’ were high with few lower ‘results’.