HL math statistics/regression

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

probability sampling

  • every item in the population has equal chance of being included in sample

  • has the least bias

  • the costliest/most time-consuming

  • includes simple random sampling, systematic sampling, stratified sampling

2
New cards

non-probability sampling

  • used for small samples → not used to make inference to wider population

  • clear rationale needed since including some individuals but not others

3
New cards

biased sample

  • subgroups within the sample do not have a proportional representation of the population. over or under representation

  • employing randomisation helps reduce bias and ensure fair representation

4
New cards

simple random sampling

  1. assign serial numbers 1 to n to each student

  2. use computer to randomly draw 100 numbers → sample size is 100

    note: if rng produces number ranging from 0 to 1, multiply by the sample size

  • each member has equal chance at being chosen

  • disadvantage: list of entire population is unavailable/subject to change

5
New cards

systematic sampling

  1. assign every student a serial number

  2. arrange by serial number

  3. every 10th student chosen after selecting a random start point

  • every nth member after a random start point is selected

  • simple

  • if the sample frame has been arranged according to some relevant characteristics, this produces a spread of values for the characteristic

6
New cards

stratified sampling

  1. find ratio of subgroups in the population

  2. choose sample size based on ratio, ensure all subgroups are appropriately represented

  3. within each subgroup, use simple random sampling or systematic sampling (probability sampling) to choose samples

  • used when distinct subgroups are well-defined, each demanding sufficient inclusion

  • most accurately depicts broader population

  • disadvantage: higher cost and lengthier process

7
New cards

convenience sampling

  • select participants that are most readily and easily available

  • biased, not representative of wider population

8
New cards

quota sampling

  1. establish specific number of participants needed from each subgroup

  2. within each subgroup, use convenience sampling (non-probability sampling) to choose samples

  3. once required number of participants from a certain subgroup is reached, any additional participants from that subgroup are excluded

9
New cards

sources of bias in sampling techniques

  • some members of population are excluded from sampling frame

  • non-response

  • bad design: questions unclear, respondents give wrong info

  • biased respondent: not telling the truth

  • unrepresentative data: wrong sampling method chosen (eg convenience sampling used to represent wider population), results skewed

10
New cards

steps for grouping data into class intervals of equal size

  1. identify largest and smallest observations

  2. find difference between largest and smallest observations

  3. decide on no. of class intervals (5-20)

  4. choose 1st class interval to include smallest observation, last to include largest

note: discrete data can also be grouped into classes, but the original data will be lost

11
New cards

mid-interval value

average of lower and upper limit of the interval

(not the interval boundaries!)

12
New cards

interval length/width

difference between upper and lower interval boundaries

13
New cards

interval boundaries

range of original measurement that can possibly be in that class

14
New cards

average

central point, single score/value that represents entire set of data

15
New cards

mean

measure of central location of the data

  • takes into account every score in the distribution

  • is the most stable measure of central tendency from sample to sample

  • provides basis for many statistical comparisons

note: for grouped data, calculate mean using MID-INTERVAL VALUE of the class intervals

recall: mid-interval value is calculated using the class lower/upper limit, NOT class boundaries

16
New cards

median

score/value that occupies the middle position in a distribution of scores

note: set of ungrouped data must first be raked in order from largest to smallest

17
New cards

mode

score/value that occurs most frequently in a set of data

18
New cards

modal class

from grouped frequency distribution, class with highest frequency

19
New cards

range

difference between max and min values

20
New cards

interquartile range

measures degree of dispersion about the median

21
New cards

outliers

extreme data values that skewed the data

data is outlier if it is more than Q3 + 1.5XIQR or less than Q1 - 1.5XIQR

22
New cards

box and whisker plot

shows min, max, Q1, Q3 and median

23
New cards

variance

spread of data about the mean, calculated by measuring average of squares of deviations of each data point from the mean

  • the larger the variance, the more spread out the data is

  • for grouped data, calculate using mid-value of class intervals

24
New cards

standard deviation

square root of the variance

25
New cards

correlation

degree of linear association between two variables

measured using person’s product moment correlation coefficient / population correlation coefficient

26
New cards

positive correlation

r>0

increase in one variable causes an increase on the average in the second variable

r=1: perfect

0.95<=r<1: very strong

0.87<=r<0.95: strong

0.5<=r<0.87: moderate

0.1<=r<0.5: weak

0<=r<0.1: no correlation

27
New cards

negative correlation

r<0

increase in one variable causes a decrease on the average of the second variable

r=-1: perfect

-1<r<=-0.95: very strong

-0.95<r<=-0.87: strong

-0.87<r<=-0.5: moderate

-0.5<r<=-0.1: weak

-0.1<r<=0: no correlation

28
New cards

zero correlation

r is around 0

no linear correlation

29
New cards

line of best fit

regression line of y on x. used to predict y values given x values

passes through mean point (mean x, mean y) → regression line of x on y will intersect with regression line of y on x at mean point

try to minimise square of the vertical distance between the line and data points

30
New cards

reliability of regression eq for prediction

  • strength of correlation

  • interpolation vs extrapolation

  • outliers

31
New cards
32
New cards
33
New cards