Population
A collection of all the items
Sample
A selection of the population to use data from
Census
When data is taken from every member in the population
Advantages of a census over a sample
More representative, less biased, includes everyone's opinions
Advantages of a sample over a census
Quicker, cheaper, easier to analyse as less data
Disadvantages of a census over a sample
Time consuming, expensive, difficult to do
Disadvantages of a sample over a census
Less representative, possibly biased
Pilot Study
A small scale replica of the survey to be carried out.
Advantages of a pilot study
Ensures questions can be understood, identify ambiguity, test response rate, identifies likely responses, check methods
Sampling Frame
A list containing data that a sample can be taken from
Examples of a sampling frame
Electoral role, SIMS register, DVLA, telephone directory
Primary Data
Data that has been collected by the person doing the survey
Secondary data
Data that hasn't been collected by the person doing the survey
Advantages of primary data
More reliable, up-to-date, tailored for investigation
Advantages of secondary data
Easier to obtain, cheaper, less time-consuming
Continuous Data
Data that lies on a continuous scale (can be at any point on a number line)
Discrete Data
Data that consists of separate numbers (jumps along the number line)
Quantitative Data
Data that has numerical values
Qualitative Data
Data that is not numerical values
Open Questions
Has no suggested answers and has freeform boxes to reply in
Advantages of open questions
Allows for a range of responses, so can cover all eventualities
Closed Questions
Has a set of answers for the person to choose from
Advantages of closed questions
Easier to analyse as range of responses restricted
Leading Questions
Questions that infer an opinion and promote a certain answer
Convenience Sample
The first so many pieces of data in the list are sampled
Advantages of a convenience sample
Quick and easy
Disadvantages of a convenience sample
Unlikely to be representative
Random Sample
Each person has an equally likely chance to be picked
How to take a random sample
(a) Number everyone in list
(b) Use a random number generator to select numbers
(c) Select the data points corresponding to the numbers picked
(d) If you get a number outside the range or the same number twice you repeat, if you get a decimal round to the nearest number.
Advantages of a random sample
Easy to do
Disadvantages of a random sample
May not be representative
Systematic Sample
Data is chosen at regular intervals (e.g. every 10th person)
How to take a systematic sample
Order population and divide population by sample size to find how often data chosen. Then choose random number to decide where in this interval to start.
Advantages of a systematic sample
Useful for production line - will spot problems over time
Disadvantages of a systematic sample
May not be representative
Quota Sample
The same amount of people from different chosen groups are sampled
How to take a quota sample
Decide on a quota size for each group. Then take a random sample, ignoring any results from a group where the quota has been reached.
Advantages of a quota sample
Makes sure all quota groups are represented, easy to take
Disadvantages of a quota sample
Not likely to be representative, may be difficult to reach quota if numbers limited
Cluster Sample
The population is divided into groups and a group is chosen at random.
Advantages of a cluster sample
Easy to do
Disadvantages of a cluster sample
Unlikely to be representative
Stratified Sample
Where the data sampled in each group is proportional to that of the whole population
How to take a stratified sample
Multiply the fraction of each group in the whole population by the total sample size to decide on the size of the sample in each strata. Then take a random sample.
Advantages of a stratified sample
Representative
Disadvantages of a stratified sample
Harder to collect, more expensive
Features of a good question
Unambiguous, closed, non-overlapping answer boxes, unbiased/not leading, not offensive or personal, easy to analyse
Positive correlation
As one variable increases, so does the other
Negative correlation
As one variable increases, the other decreases
Response variable
The variable being measured or studied
What values does the SRCC lie between and what do they mean?
-1 and 1.
1 = Perfect positive correlation
0 = No correlation
-1 = Perfect negative correlation
What does the symbol x with a line above it mean?
The mean average value of x
How is frequency represented on histograms?
By area
What do we call the height on a histogram?
The frequency density
What does the capital sigma (that looks like an 'E') symbol mean?
Sum
How is the IQR calculated?
UQ - LQ
How much of the data is contained within each quartile?
25%
How do define the median?
The middle value in a dataset
How would we compare two distributions given their median and IQR?
Higher median = higher result on average
Higher IQR = less consistent on average
How would I define outliers?
Low outliers < LQ - 1.5 x IQR
High outliers > UQ + 1.5 x IQR
What does a positive skew look like?
The median is closer to the LQ than the UQ.
What does a negative skew look like?
The median is closer to the UQ than the LQ.
If a line of best fit is given by y = ax + b, what does 'a' mean?
For every unit the 'x' variable increases, the 'y' variable increases by 'a'.
What does a normal distribution look like?
Symmetrical about mean, bell-shaped curve
How much of the data is within 2 s.d. of the mean for a normal distribution?
95%
How much of the data is within 3 s.d. of the mean for a normal distribution?
99.8%
What conditions need to be met for a binomial distribution?
Two outcomes (success or failure), fixed number of independent trials, fixed probability of success
What is a discrete uniform distribution and what would it's graph look like?
The same probability for all events. The graph is a bar chart with each possible outcome going to the same height (the probability of it happening).
How could one compare values from different sets of data?
Use a standardised score.
How is a standardised score calculated?
(score-mean)/s.d.
What is an index number?
A number that shows the rate of change in quantity, value or price of an item over a period of time.
How is an index number calculated?
100*[(quantity in given year)/(quantity in base year)]
What is a chain base index number?
The annual percentage change in quantity, value or price of an item. It is found by using the previous year as the base year.
What is a trend line?
A line of best fit through moving averages
How would you describe a trend line?
As increased or decreasing, not as positive or negative
What is the average seasonal variation?
The mean average difference between the trend line and actual value for a given season
How can one predict values using a trend line?
Read the value from the trend line for the season wanted and add/subtract the average seasonal variation
Why might one not want to predict a value from a scatter graph or trend line?
If the correlation is not strong enough or if the prediction lies outside the range of data (extrapolation)
Mutually Exclusive
Two events that cannot happen at the same time.
Independent events
Two events that have no impact on one another (one happening doesn't affect the probability of the other)
Exhaustive
A set of events that covers all possibilities
What do the probabilities of mutually exclusive exhaustive events sum to?
1
When can we add probabilities?
When they are mutually exclusive.
When can we multiply probabilities?
When they are independent.
What might we use to find probabilities of two events following one another?
A tree diagram.