1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
probability sampling
every item in the population has equal chance of being included in sample
has the least bias
the costliest/most time-consuming
includes simple random sampling, systematic sampling, stratified sampling
non-probability sampling
used for small samples → not used to make inference to wider population
clear rationale needed since including some individuals but not others
biased sample
subgroups within the sample do not have a proportional representation of the population. over or under representation
employing randomisation helps reduce bias and ensure fair representation
simple random sampling
assign serial numbers 1 to n to each student
use computer to randomly draw 100 numbers → sample size is 100
note: if rng produces number ranging from 0 to 1, multiply by the sample size
each member has equal chance at being chosen
disadvantage: list of entire population is unavailable/subject to change
systematic sampling
assign every student a serial number
arrange by serial number
every 10th student chosen after selecting a random start point
every nth member after a random start point is selected
simple
if the sample frame has been arranged according to some relevant characteristics, this produces a spread of values for the characteristic
stratified sampling
find ratio of subgroups in the population
choose sample size based on ratio, ensure all subgroups are appropriately represented
within each subgroup, use simple random sampling or systematic sampling (probability sampling) to choose samples
used when distinct subgroups are well-defined, each demanding sufficient inclusion
most accurately depicts broader population
disadvantage: higher cost and lengthier process
convenience sampling
select participants that are most readily and easily available
biased, not representative of wider population
quota sampling
establish specific number of participants needed from each subgroup
within each subgroup, use convenience sampling (non-probability sampling) to choose samples
once required number of participants from a certain subgroup is reached, any additional participants from that subgroup are excluded
sources of bias in sampling techniques
some members of population are excluded from sampling frame
non-response
bad design: questions unclear, respondents give wrong info
biased respondent: not telling the truth
unrepresentative data: wrong sampling method chosen (eg convenience sampling used to represent wider population), results skewed
steps for grouping data into class intervals of equal size
identify largest and smallest observations
find difference between largest and smallest observations
decide on no. of class intervals (5-20)
choose 1st class interval to include smallest observation, last to include largest
note: discrete data can also be grouped into classes, but the original data will be lost
mid-interval value
average of lower and upper limit of the interval
(not the interval boundaries!)
interval length/width
difference between upper and lower interval boundaries
interval boundaries
range of original measurement that can possibly be in that class
average
central point, single score/value that represents entire set of data
mean
measure of central location of the data
takes into account every score in the distribution
is the most stable measure of central tendency from sample to sample
provides basis for many statistical comparisons
note: for grouped data, calculate mean using MID-INTERVAL VALUE of the class intervals
recall: mid-interval value is calculated using the class lower/upper limit, NOT class boundaries
median
score/value that occupies the middle position in a distribution of scores
note: set of ungrouped data must first be raked in order from largest to smallest
mode
score/value that occurs most frequently in a set of data
modal class
from grouped frequency distribution, class with highest frequency
range
difference between max and min values
interquartile range
measures degree of dispersion about the median
outliers
extreme data values that skewed the data
data is outlier if it is more than Q3 + 1.5XIQR or less than Q1 - 1.5XIQR
box and whisker plot
shows min, max, Q1, Q3 and median
variance
spread of data about the mean, calculated by measuring average of squares of deviations of each data point from the mean
the larger the variance, the more spread out the data is
for grouped data, calculate using mid-value of class intervals
standard deviation
square root of the variance
correlation
degree of linear association between two variables
measured using person’s product moment correlation coefficient / population correlation coefficient
positive correlation
r>0
increase in one variable causes an increase on the average in the second variable
r=1: perfect
0.95<=r<1: very strong
0.87<=r<0.95: strong
0.5<=r<0.87: moderate
0.1<=r<0.5: weak
0<=r<0.1: no correlation
negative correlation
r<0
increase in one variable causes a decrease on the average of the second variable
r=-1: perfect
-1<r<=-0.95: very strong
-0.95<r<=-0.87: strong
-0.87<r<=-0.5: moderate
-0.5<r<=-0.1: weak
-0.1<r<=0: no correlation
zero correlation
r is around 0
no linear correlation
line of best fit
regression line of y on x. used to predict y values given x values
passes through mean point (mean x, mean y) → regression line of x on y will intersect with regression line of y on x at mean point
try to minimise square of the vertical distance between the line and data points
reliability of regression eq for prediction
strength of correlation
interpolation vs extrapolation
outliers