Definitions of sampling and their methids, as given by the aqa board Large Data set key facts listed by TLMaths
Simple Random Definition
Every possible sample (of a given size) has an equal chance of being selected
Simple Random method
Assign number to each member of the population
Use a random number generator to generate n numbers, limiting the numbers it can generate from 1 - n. Skip and repull for any numbers that are pulled twice. Once n numbers have been generated, re assign the numbers back to the associated member of the population
Stratified Definition
splitting population into groups relevant to research, then random sampling from each group in proportion to the size of the group
Stratified method
Identify groups to split the population to
Calculate how many of each group shoukd be included, (group/population size * sample size wanted)
Conduct random sample to find sample size wanted in each group
Opportunity Definition
Selecting sample based on convenience and availability
Opportunity Adv
does not find random sample#
may produce good estimate of population parameters
cheap
convenient
may not be generalisable
Stratified Adv
srs is representative of overall population, but strat equalizes frequency
requires list of entire population
determining which factors are relevanty to research not always obvious
random
Simple random Adv
requires list of entire population
may be hard to get responses from generated numbers
time consuming and expensive
random and unbiased
Systematic Definition
Taking participants at regular intervals
Systematic Method
Give each name a number from 1 - n
calculate how often a number needs to be selected (population/sample size wanted) = x
Randomly select a number between 1 and x and select every x person on the register from the first person selected
Systematic Adv
avoids unwanted clustering
if there was a fault every x sample, the systematic may end up clustering/fail
practically easier than number generators
random
less random than srs as no longer independent
requires list of entire population
Quota Defintion
Splitting population into groups relevant to research, then selecting participants from those groups using opportunity sampling
Quota Adv
cheap
easy
convenient
alternative to strat when list of population not available
ensures samples relevant to factors identified
NOT RANDOM
Cluster Definition
Splits population into groups bases on convenience (eg location) called clusters, the randomly choosing which group to study further
Cluster Adv
RANDOM
least random out of random methods
worse than strat, high chance of choosing unrepresentative sample
significantly cheaper than alternatives
does not require list of entire population
Large Data Set Headings
Car Brands
Regions
Propulsion Types
Keeper Title IDS
Year Registered
Body Type
Emission Type
Body Types
LDS Car brand details
5 car brands,
BMW
Ford
Toyota
Vauxhall
Volkswagan
LDS DETAIL REgions
3 regions
North west
south west
London
not enough regions to make comparion across entire UK
LDS DETAIL KEEPER TITLE IDS
5 titles
Male
Female
not used - literally not used at all
unknown (eg dr)
Company
LDS Units used
g/km - emissions
cm³ - engine size
kg - weight of car and driver
+75 kg added to each car for driver mass
Missing Data/ Errors LDS
missing data for hydrocarbon and particulate emissions
masses of some cars recorded as zero - ERROR
data needs to be cleaned before being used for parameter calculations
LDS DETAIL Propulsion Types
5 types
Petrol
Diesel
Electric - 1 (!!!) car
Petrol/Gas - 1 car
Petrol/ Electric
LDS DETAIL Emission Types
CO2
CO
NOX
Hydrocarbon
Particulate - only for diesel cars
What years in LDS
2002 and 2016 due to emissions scandel
LDS DETIALS Body types
eg saloon, convertible, 3 or 4 door hatchback