HL math statistics/regression

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/32

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

33 Terms

New cards

probability sampling

every item in the population has equal chance of being included in sample
has the least bias
the costliest/most time-consuming
includes simple random sampling, systematic sampling, stratified sampling

New cards

non-probability sampling

used for small samples → not used to make inference to wider population
clear rationale needed since including some individuals but not others

New cards

biased sample

subgroups within the sample do not have a proportional representation of the population. over or under representation
employing randomisation helps reduce bias and ensure fair representation

New cards

simple random sampling

assign serial numbers 1 to n to each student
use computer to randomly draw 100 numbers → sample size is 100
note: if rng produces number ranging from 0 to 1, multiply by the sample size

each member has equal chance at being chosen
disadvantage: list of entire population is unavailable/subject to change

New cards

systematic sampling

assign every student a serial number
arrange by serial number
every 10th student chosen after selecting a random start point

every n^th member after a random start point is selected
simple
if the sample frame has been arranged according to some relevant characteristics, this produces a spread of values for the characteristic

New cards

stratified sampling

find ratio of subgroups in the population
choose sample size based on ratio, ensure all subgroups are appropriately represented
within each subgroup, use simple random sampling or systematic sampling (probability sampling) to choose samples

used when distinct subgroups are well-defined, each demanding sufficient inclusion
most accurately depicts broader population
disadvantage: higher cost and lengthier process

New cards

convenience sampling

select participants that are most readily and easily available
biased, not representative of wider population

New cards

quota sampling

establish specific number of participants needed from each subgroup
within each subgroup, use convenience sampling (non-probability sampling) to choose samples
once required number of participants from a certain subgroup is reached, any additional participants from that subgroup are excluded

New cards

sources of bias in sampling techniques

some members of population are excluded from sampling frame
non-response
bad design: questions unclear, respondents give wrong info
biased respondent: not telling the truth
unrepresentative data: wrong sampling method chosen (eg convenience sampling used to represent wider population), results skewed

New cards

steps for grouping data into class intervals of equal size

identify largest and smallest observations
find difference between largest and smallest observations
decide on no. of class intervals (5-20)
choose 1st class interval to include smallest observation, last to include largest

note: discrete data can also be grouped into classes, but the original data will be lost

New cards

mid-interval value

average of lower and upper limit of the interval

(not the interval boundaries!)

New cards

interval length/width

difference between upper and lower interval boundaries

New cards

interval boundaries

range of original measurement that can possibly be in that class

New cards

average

central point, single score/value that represents entire set of data

New cards

mean

measure of central location of the data

takes into account every score in the distribution
is the most stable measure of central tendency from sample to sample
provides basis for many statistical comparisons

note: for grouped data, calculate mean using MID-INTERVAL VALUE of the class intervals

recall: mid-interval value is calculated using the class lower/upper limit, NOT class boundaries

New cards

median

score/value that occupies the middle position in a distribution of scores

note: set of ungrouped data must first be raked in order from largest to smallest

New cards

mode

score/value that occurs most frequently in a set of data

New cards

modal class

from grouped frequency distribution, class with highest frequency

New cards

range

difference between max and min values

New cards

interquartile range

measures degree of dispersion about the median

New cards

outliers

extreme data values that skewed the data

data is outlier if it is more than Q3 + 1.5XIQR or less than Q1 - 1.5XIQR

New cards

box and whisker plot

shows min, max, Q1, Q3 and median

New cards

variance

spread of data about the mean, calculated by measuring average of squares of deviations of each data point from the mean

the larger the variance, the more spread out the data is
for grouped data, calculate using mid-value of class intervals

New cards

standard deviation

square root of the variance

New cards

correlation

degree of linear association between two variables

measured using person’s product moment correlation coefficient / population correlation coefficient

New cards

positive correlation

r>0

increase in one variable causes an increase on the average in the second variable

r=1: perfect

0.95<=r<1: very strong

0.87<=r<0.95: strong

0.5<=r<0.87: moderate

0.1<=r<0.5: weak

0<=r<0.1: no correlation

New cards

negative correlation

r<0

increase in one variable causes a decrease on the average of the second variable

r=-1: perfect

-1<r<=-0.95: very strong

-0.95<r<=-0.87: strong

-0.87<r<=-0.5: moderate

-0.5<r<=-0.1: weak

-0.1<r<=0: no correlation

New cards

zero correlation

r is around 0

no linear correlation

New cards

line of best fit

regression line of y on x. used to predict y values given x values

passes through mean point (mean x, mean y) → regression line of x on y will intersect with regression line of y on x at mean point

try to minimise square of the vertical distance between the line and data points

New cards

reliability of regression eq for prediction

strength of correlation
interpolation vs extrapolation
outliers

New cards