Simple random sampling
Each item allocated a unique number
Numbers chosen at random using random number generator
- Bias free
- Fast, easy, cheap
- Each number has a known equal chance of being selected
- More difficult as the population size gets larger
- Full sampling frame needed
Systematic sampling
Elements are chosen at regular intervals from an ordered list. First member chosen randomly then goes up in e.g 5ths
- Simple, fast, cheap
- Suitable for large samples and large populations
- Full sampling frame is needed
- It can introduce bias if the sampling frame is not random
How to find how many people to put in each strata for stratified sampling
(actual number in group ÷ total population) × sample size
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from the same disadvantages as simple random sampling
Quota sampling
Interviewer creates groups for population to be put into and decides the proportions
Meet each member and put them into correct group
Continues until all quotas (groups) are filled
If a person refuses to be interviewed or the quota is already full then ignore the answer
- No sampling frame required
- Small sample can represent whole population
- Fast, easy, cheap
- Easy comparison between groups in population
- Non-random sampling is unrepresentative, can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Larger sample increases number of groups which adds time and expense
- Non-responses are not recorded
- No sampling frame required
- Fast, easy, cheap
- Inexpensive
- Non-random sampling is unrepresentative, can introduce bias
- Highly dependent on individual researcher
What is cluster sampling
Population is split into clusters where each member of population can only be in one cluster
Sample is taken from each cluster
The sample taken can be using any sample technique
Often the clusters are geographic e.g taking clusters from different parts of the UK where a particular type of bird is common
Is usually two stage
Can be random or non-random
- Simple random
- Systematic
- Stratified
- Opportunity
- Quota
Q3-Q1 = IQR
3(n+1)/4 - (n+1)/4 = IQR
1) Find midpoint of range
2) Multiply by frequency
3) Find mean of new values
1 SD = 68%
2 SD = 95%
3 SD = 99.8%
Equations to test independence
P(AandB) = P(A) × P(B)
P(A|B) = P(A|B’) = P(A)
It has a left tail
Q3 - Q2 > Q2 - Q1
It has a right tail
Q2 - Q1 > Q3 - Q2
Estimating a value outside the range of measured data
CAN'T DO THIS!
Conditions for binomial distribution
1. A fixed number of trials, n
2. Each trial has two possible outcomes
3. The probability of success, p, is the same for each trial
4. Each observation is independent
Data that can only take certain values, e.g shoe size.
Shown using bar charts, tally charts, pie charts.
Data that can take any value, e.g height.
Shown using line graph.
- Probability is close to 0.5
- There is a large number of trials, n > 50
Key facts about the large data set
- 3827 cars in total
- Only five makes of car are included: Ford, BMW, Vauxhal, Toyota and VW. Ford is the most frequently registered.
- Only one electric vehicle in whole data set and only gas petrol hybrid vehicle in data set
- 5 door hatchback is the most common body type
- Data is only from a few days in summer, June, in 2002 and 2016: there is more data from 2016 than 2002
- Mass of vehicle includes 75kg driver
- Emissions data is only known for approx 80% of the whole data set: CO2, CO, NOX
- Particulate emissions are applicable to diesel cars
- Doesn’t include drivers of company cars / only shows name the car is registered to, not the driver
- Doesn’t include all regions in England, only NW, SW and London
- CO2 emissions are in 10s and 100s
- CO emissions are in decimals
- Only cars, not vans, buses etc
- Some of the categories are codes ie numbers represent different types, e.g the body type is represented by a number
Which categories in LDS are codes?
Propulsion type: petrol, diesel, electric, gas+petrol, electric+petrol
Body type
Owner of car: male, female, not used, unknown, company
- Data must be continuous
- 95% of the data must be within 2 standard deviations of the mean
H0: 𝑝 = 0
H1: 𝑝 > 0, 𝑝 < 0 or 𝑝 ≠ 0
How to find critical region for a normal?
Use the inverse normal function with the probability equal to significance level
For a normal Hypothesis test, what is the new value for standard deviation when you change it to the sample?
standard deviation² ÷ n
How many vehicles in LDS?
3827