1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
population
all of the possible data
sample
part of the possible data
parameter
a numerical property of the population
statistic
a numerical property of a sample
sampling frame
a list of all the members of a population
outliers
a value less than: Q1 - 1.5 x IQR
a value more than: Q3 + 1.5 x IQR
a value less than 2 standard deviations below the mean
a value more than 2 standard deviations above the mean
frequency density formula
frequency / class width
simple random sample
number your sampling frame from 1 - N
use a random number generator to generate numbers between 1 - N
generate n numbers within the range 1 - N
ignore any repeats (restricted) or include repeats (unrestricted)
choose the members of the population that match the numbers
Adv and disadv of simple random sample
Adv - not bias, cheap, equal chance of selection
Disadv- sampling frame needed, not suitable for large population
proportional stratified sampling
block the population in strata (groups) by a chosen characteristic
select sample sizes for each strata in the same proportion as the population
use a simple random sample to select members of each strata
non proportional stratified sample
block the population in strata by a chosen characteristic
select sample sizes for each strata as decided by the person administering the experiment
use a simple random sample to select members of each strata
Adv and disadv of stratified sampling
Adv- representative of population, proportional representation
Disadv - population must be classified into strata
cluster sample
randomly select representative clusters within a population
select your sample from these clusters (either all members or a simple random sample from each cluster)
Adv and disadv of cluster sampling
Adv - easy to carry out, cheap
Disadv - bias, probability of number being selected depends on size of cluster
systematic sample
number your sampling frame
divide population by sample size = k
generate one random number between 1 and k to use as starting point
your sample size is selected by choosing the starting point followed by every kth person in the list after
Adv and disadv of systematic sampling
Adv- simple& quick, suitable for large populations
Disadv- sampling frame needed, bias if sampling frame not random
Adv and disadv of judgemental sampling
Adv- only selects most suitable candidates, saves money and effort on collecting unnecessary data, representative of the population
Disadv - selection criteria are subjective, sample can be influenced by the researchers viewpoint, dependant on skill of researcher
adv & disadv of snowball sampling
adv- allows researchers to reach populations that are difficult to access, cost effective, requires little planning
disadv- may not be representative, can lead to sampling bias, subjects share similar traits, may be difficult/ time consuming to get a large enough sample
experimental design
select a large random sample of participants (randomisation)
group participants by a particular attribute (blocking)
select a control group receives the placebo
run an experiment with the control group and experiment group (paired comparison
blind trial
participants dont know if they are receiving treatment or placebo
double blind trial
neither participants or experimenter know who is receiving treatment or placebo
probability notation
P(A n B) = probability that A and B both occur (intersection)
P(A u B) = probability that either A or B or both occur (union)
probability laws
(addition law) P(A u B) = P(A) + P(B) - P(A n B)
(multiplication law) P(A n B) = P(A) x P(B | A) or P(A n B) = P(B) x P(A | B)
(if A and B are mutually exclusive) P(A u B) = P(A) + P(B) or P(A n B) = 0
(if A and B are independent) P(A n B) = P(A) x P(B)
position from a list of data or frequency table
position of median: n+1/2
position of Q1: n+1/4 ( half of position of median )
position of Q3: 3(n+1)/4 (position of Q1 + position of median)
position from a grouped data table
position of median: n/2
pos of Q1: n/4
pos of Q3: 3n/4
pos of xth percentile: x/100 x n
pos of xth decile: x/10 x n
linear interpolation of the median from a grouped data table
1/2n - f / fc x c + b
b = lower class boundary of the median
f = sum of frequencies below b
fc = frequency of median class
c = class width of median class
residuals
actual value - estimate value
more reliable if:
there is a strong correlation
estimates are within range of original data
residuals are small
spearmans rank formula
6x sum of d squared/ n(n squared - 1)
probability distributions mean and variance formula
mean = sum of x * p(X=x)
variance = sum of x squared * p(X = x) - mean squared
conditions of a binomial distribution
2 possible outcomes
fixed number of trials
fixed probability of success
trials are independent
binomial mean and variance
mean = np
var = np(1-p)
standardised normal distribution
Z ~ N (0,1)
formula: Z = x - mean / standard deviation
conditions for a poisson and exponential distribution
events must occur one at a time
each event must be independent
the average rate must remain constant
poisson mean and variance
the average rate
exponential mean and variance
mean = 1/ average rate
var = 1/ average rate squared
exponential distribution formula
p(X < x) = 1 - e(-average rate * x)
p(X = x) = 0
p(X > x) = e(-average rate * x)
uniform distribution formulas
probability density, p = 1/ b - a
probability = area
mean = a + b / 2
var = (b - a) squared / 12
coding
adding/subtracting affects: mean & sd
multiplication/division affects: only mean
pmcc assumptions & how to calculate
assumptions:
linear relationship between the bivariate data
data is normally distributed
no significant outliers in each data set
on calculator: calc> reg> x> a+bx
linear regression
y = a + bx
to find a & b: calc> reg> x> a + bx
b represents the increase or decrease in y as x increases by 1
a represents the value of y when x is 0
distribution of sample means
Xbar ~ N ( mean (standard deviation/ square root n) squared)
normal approximation to binomial distributions
conditions:
n is large (n > 20) and p is 0.5
or
np > 10 and n(1-p) > 10
X ~ N (np , np(1 - p) )
square root variance for sd