statistics - summarising data

0.0(0)
studied byStudied by 5 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/48

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

49 Terms

1
New cards

What is an average

A value that represents the centre of a set of data. Includes mode, median and mean

2
New cards

What is the modal class?

The class interval with the highest frequency (the frequency value is not the mode but the column/row next to it)

3
New cards

What if the median position is a decimal (e.g. 7.5)?

Find the 7th and 8th values and divide their sum by 2

4
New cards

If data is in a frequency table (discrete) - median

  1. add frequency (CF)

  2. then, sum of frequency + 1 / 2

  3. Find the first CF that is equal to or greater than the number you got

5
New cards

If data is in a frequency table (grouped) - median

  1. find total frequency

  2. find n/2

  3. find the median class

  4. do Median CF - Lower boundary CF e.g 7th - 4th = 3

  5. divide 3 by frequency of median class e.g 3/5

  6. then do HB - LB e.g 170 - 160 = 10

  7. then do 3/5 × 10

  8. + the lower boundary e.g 3/5 × 10 + 160 = 166cm

6
New cards

Finding mean of frequency table (not grouped)

  1. make third collumn for Value (x) x frequency (f)

  2. find sum of fx

  3. find sum of f

  4. mean = sum fx/sum f

7
New cards

Finding mean of frequency table (grouped)

sum freq x midpoint/sum freq

8
New cards

find weighted mean

A student’s final mark is made of:

  • Coursework = 25%

  • Exam = 75%

Marks:

  • Coursework = 60

  • Exam = 80

Step-by-step

  1. Multiply each mark by its weight

    • Coursework: 60 × 0.25 = 15

    • Exam: 80 × 0.75 = 60

  2. Add the results

    • 15 + 60 = 75

  3. weighted mean = 75

9
New cards

find weighted mean for freq table

sum mark x weight/sum weight

10
New cards

what is weighted mean?

A weighted mean is just an average where some values count more than others.

In a normal mean, everything is equally important.

In a weighted mean, some things are more important, so they get more weight.

11
New cards

geometric mean

multiply number together then take the nth (how many numbers there are in total) root e.g

geometric mean of 1, 2 and 32

1 × 2 × 32 = 64

cube root (bc there are 3 values) 64 = 4

12
New cards

advantages of mean

  1. it uses all the data

  2. Useful for further calculations

13
New cards

disadvantaged of mean

  1. May not be a data value

  2. Always affected by extreme values or outlier.

14
New cards

advantages of median

  1. Not affected by extreme values (outliers)

  2. Represents the middle value

15
New cards

disadvantages of median

  1. May not be data value

  2. Not always representative of the data.

16
New cards

advantages of mode

  1. only average for qualitative data

  2. easy to identify

17
New cards

disadvantages of mode

  1. There may not be a mode or may be

more than one mode.

  1. Cannot be used to calculate measures

of spread

18
New cards

transforming data

For large data values you may want to make the numbers smaller so that it saves you time since it’s also easier to make mistakes with bigger numbers.

You can find the mean by taking away the same large number from all the values. e.g For values 1.04, 1.09, 1.03, 1.12 you might want to subtract 1 from all the values first and then multiply by 100 to make them whole numbers.

19
New cards

How Changes to your Data affect the Averages - mode

  1. could change only if the new value changes which value appears the most.

  2. Could also make the data bimodal if there are now two values that appear the same amount.

20
New cards

How Changes to your Data affect the Averages - median

  1. If you add a value that is greater than the median, the median might increase.

  2. If you add a value that is smaller than the median, the median might decrease.

  3. If you remove a value that is greater than the median, the median might decrease.

  4. If you remove a value that is smaller than the median, the median might increase.

  5. If you add/remove one value that is greater and one that is smaller than the median, the median stays the same.

21
New cards

How Changes to your Data affect the Averages - median

  1. If you add a value that is greater than the mean, the mean increases.

  2. If you take away a value that is less than the mean, the mean increases.

  3. If you add a value that is less than the mean, the mean decreases.

  4. If you take away a value that is greater than the mean, the mean decreases.

  5. If you replace a value in your data with another number that is greater/smaller than the original, the mean will also change.

22
New cards

what is IQR

IQR measures how spread out the middle 50% of data is e.g LQ = 25 percent, UQ = 75 percent

23
New cards

calc for IQR

IQR = UQ - LQ

24
New cards

calc for LQ (discrete)

¼ of the way through

1/4(n+1)

(n = total number of data values)

25
New cards

calc for UQ (discrete)

¾ of the way through

¾(n+1)

(n = total number of data values)

26
New cards

what is range

How spread out the data is

27
New cards

range from tables

For data from tables the largest value is the biggest number from the first column and the smallest value is the first number from the first column

28
New cards

IQR - grouped data

Grouped Data LQ = ¼ nth value UQ = ¾ nth value

1. Draw your CF curve.

2. Use the above formulae to find the positions for LQ (25%) and UQ (75%).

3. Draw lines from the 25% and 75% marks on the y-axis. The corresponding x-axis values give you your LQ and UQ values.

4. IQR = UQ – LQ

29
New cards

standard deviation

standard Deviation tells us how spread out the data is around the mean.

It measures average distance of each value from the mean.

30
New cards

calc for SD (discrete data)

1. Calculate the mean.

2. Subtract the mean from each data value and square the answer – it might be useful to do this in a table.

3. Add up all the answers to step 2.

4. Divide by the number of values.

5. Square root.

31
New cards

low SD

values are close to the mean (data is clustered)

32
New cards

high SD

values are far from the mean (data is more spread out)

33
New cards

calc for SD (Frequency Table - not grouped)

1. Calculate the mean.

2. Create a new column for 𝒙 − 𝒙̅. Subtract mean from each value in the first column.

3. Square each answer to step 2 – create new column.

4. Multiply each answer in step 3 by frequency – create new column.

5. Add answers to step 4 – add the last column.

6. Divide answer to step 5 by total of frequency column.

7. Square root.

(x = each indivdual height or something else)

34
New cards

calc for SD (Frequency Table - grouped)

For grouped frequency tables, follow the same step as for frequency table but use the midpoint for x. You may need to create an extra column to your table for the midpoint before carrying out the above steps.

35
New cards

boxplots

Box Plots Divide the data into sections that each contain approximately 25% of the data in that set. Represents important features of the data and gives a summary of the spread/skew of the data.

The total length of the box plot represents the range. The box represents the middle 50% and the IQR.

36
New cards

Box Plots include 5 pieces of information about the data

1. Minimum Value – the lowest score, shown at the far left of the diagram

2. Lower Quartile (LQ) – 25% of data is below this

3. Median – Mark the middle of the data – 50% of the data is above/below this value

4. Upper Quartile (UQ) – 25% of data is above this value/75% of data is below it.

5. Maximum Value – The highest score, shown at the far right of the diagram

37
New cards

outliers

Including outliers may misrepresent your data but not including them could falsify your data. They distort the data so you need to identify them.

38
New cards

how to find outliers

Outliers are more than 1.5 X IQR above UQ or below LQ.

𝑶𝒖𝒕𝒍𝒊𝒆𝒓𝒔 𝒂𝒓𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 > 𝑼𝑸 + (𝟏. 𝟓 × 𝑰𝑸𝑹) 𝒐𝒓 < 𝑳𝑸 − (𝟏. 𝟓 × 𝑰𝑸𝑹)

1. Work out IQR

2. Find 1.5 x IQR

3. Subtract this value from LQ and add to UQ.

4. These values are now your new min/max points for your box plot. Any values in your data outside of this range are outliers.

5. Mark outliers with an X on your box plot.

39
New cards

Outliers can also be found using the mean and standard deviation

they are values more than 3 SD away from the mean.

𝑶𝒖𝒕𝒍𝒊𝒆𝒓𝒔 = 𝑽𝒂𝒍𝒖𝒆𝒔 𝒐𝒖𝒕𝒔𝒊𝒅𝒆 𝒙̅ ± 𝟑SD

40
New cards

Interpreting box plots

  1. median for measure of average - higher median = taller, bigger etc

  2. range or IQR for measure of spread

  3. Compare skewness of both box plots

41
New cards

skew

Describes the shape of the distribution and tells you how the data is spread out. If the data is skewed, it means most of the values are more on one side of the median.

42
New cards

positive skew

the data above the median is more spread out

mean > median > mode

43
New cards

symmetrical skew

the data is evenly spread out above and below the median

mean = median = mode

44
New cards

negative skew

the data below the median is more spread out

45
New cards

Skewness using the Formula

Formula: 𝑺𝒌𝒆𝒘𝒏𝒆𝒔𝒔 = 𝟑(𝒎𝒆𝒂𝒏−𝒎𝒆𝒅𝒊𝒂𝒏)/𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏

  1. Positive Value = positive skew. The larger the value, the larger the skew.

  2. Negative Value = Negative Skew. The smaller the value, the stronger the skew.

  3. Value of 0 = No Skew/Symmetrical.

46
New cards

comparing data sets

Compare using a measure of average (mean/median/mode) and spread (range/IQR/SD) or skewness.

Always make reference to individual values and mention which data set is larger/smaller than the other clearly.

Always interpret in context – link back to the scenario in the question and labels on axes

47
New cards

Comparing Averages

Mean/median/mode for data set A is larger than mean/median/mode data set B so on average data set A is more … than data set B.

48
New cards

Comparing spread

Range/IQR/SD for data set A is larger than that of data set B so the ‘results’ of data set A are more spread out/less consistent than those of data set B.

Data A has a smaller range/IQR/SD than data set B which means the ‘results’ for ‘data set A’ are more consistent.

Remember lower SD means values are closer to the mean and therefore similar.

49
New cards

Comparing Skew

Box Plot for data set A is positively skewed so majority of ‘results’ were low with few higher ‘results’.

Box plot for data set A is negatively skewed so majority of ‘results’ were high with few lower ‘results’.