Population

A collection of all the items

Sample

A selection of the population to use data from

Census

When data is taken from every member in the population

Advantages of a census over a sample

More representative, less biased, includes everyone's opinions

Advantages of a sample over a census

Quicker, cheaper, easier to analyse as less data

Disadvantages of a census over a sample

Time consuming, expensive, difficult to do

Disadvantages of a sample over a census

Less representative, possibly biased

Pilot Study

A small scale replica of the survey to be carried out.

Advantages of a pilot study

Ensures questions can be understood, identify ambiguity, test response rate, identifies likely responses, check methods

Sampling Frame

A list containing data that a sample can be taken from

Examples of a sampling frame

Electoral role, SIMS register, DVLA, telephone directory

Primary Data

Data that has been collected by the person doing the survey

Secondary data

Data that hasn't been collected by the person doing the survey

Advantages of primary data

More reliable, up-to-date, tailored for investigation

Advantages of secondary data

Easier to obtain, cheaper, less time-consuming

Continuous Data

Data that lies on a continuous scale (can be at any point on a number line)

Discrete Data

Data that consists of separate numbers (jumps along the number line)

Quantitative Data

Data that has numerical values

Qualitative Data

Data that is not numerical values

Open Questions

Has no suggested answers and has freeform boxes to reply in

Advantages of open questions

Allows for a range of responses, so can cover all eventualities

Closed Questions

Has a set of answers for the person to choose from

Advantages of closed questions

Easier to analyse as range of responses restricted

Leading Questions

Questions that infer an opinion and promote a certain answer

Convenience Sample

The first so many pieces of data in the list are sampled

Advantages of a convenience sample

Quick and easy

Disadvantages of a convenience sample

Unlikely to be representative

Random Sample

Each person has an equally likely chance to be picked

How to take a random sample

(a) Number everyone in list

(b) Use a random number generator to select numbers

(c) Select the data points corresponding to the numbers picked

(d) If you get a number outside the range or the same number twice you repeat, if you get a decimal round to the nearest number.

Advantages of a random sample

Easy to do

Disadvantages of a random sample

May not be representative

Systematic Sample

Data is chosen at regular intervals (e.g. every 10th person)

How to take a systematic sample

Order population and divide population by sample size to find how often data chosen. Then choose random number to decide where in this interval to start.

Advantages of a systematic sample

Useful for production line - will spot problems over time

Disadvantages of a systematic sample

May not be representative

Quota Sample

The same amount of people from different chosen groups are sampled

How to take a quota sample

Decide on a quota size for each group. Then take a random sample, ignoring any results from a group where the quota has been reached.

Advantages of a quota sample

Makes sure all quota groups are represented, easy to take

Disadvantages of a quota sample

Not likely to be representative, may be difficult to reach quota if numbers limited

Cluster Sample

The population is divided into groups and a group is chosen at random.

Advantages of a cluster sample

Easy to do

Disadvantages of a cluster sample

Unlikely to be representative

Stratified Sample

Where the data sampled in each group is proportional to that of the whole population

How to take a stratified sample

Multiply the fraction of each group in the whole population by the total sample size to decide on the size of the sample in each strata. Then take a random sample.

Advantages of a stratified sample

Representative

Disadvantages of a stratified sample

Harder to collect, more expensive

Features of a good question

Unambiguous, closed, non-overlapping answer boxes, unbiased/not leading, not offensive or personal, easy to analyse

Positive correlation

As one variable increases, so does the other

Negative correlation

As one variable increases, the other decreases

Response variable

The variable being measured or studied

What values does the SRCC lie between and what do they mean?

-1 and 1.

1 = Perfect positive correlation

0 = No correlation

-1 = Perfect negative correlation

What does the symbol x with a line above it mean?

The mean average value of x

How is frequency represented on histograms?

By area

What do we call the height on a histogram?

The frequency density

What does the capital sigma (that looks like an 'E') symbol mean?

Sum

How is the IQR calculated?

UQ - LQ

How much of the data is contained within each quartile?

25%

How do define the median?

The middle value in a dataset

How would we compare two distributions given their median and IQR?

Higher median = higher result on average

Higher IQR = less consistent on average

How would I define outliers?

Low outliers < LQ - 1.5 x IQR

High outliers > UQ + 1.5 x IQR

What does a positive skew look like?

The median is closer to the LQ than the UQ.

What does a negative skew look like?

The median is closer to the UQ than the LQ.

If a line of best fit is given by y = ax + b, what does 'a' mean?

For every unit the 'x' variable increases, the 'y' variable increases by 'a'.

What does a normal distribution look like?

Symmetrical about mean, bell-shaped curve

How much of the data is within 2 s.d. of the mean for a normal distribution?

95%

How much of the data is within 3 s.d. of the mean for a normal distribution?

99.8%

What conditions need to be met for a binomial distribution?

Two outcomes (success or failure), fixed number of independent trials, fixed probability of success

What is a discrete uniform distribution and what would it's graph look like?

The same probability for all events. The graph is a bar chart with each possible outcome going to the same height (the probability of it happening).

How could one compare values from different sets of data?

Use a standardised score.

How is a standardised score calculated?

(score-mean)/s.d.

What is an index number?

A number that shows the rate of change in quantity, value or price of an item over a period of time.

How is an index number calculated?

100*[(quantity in given year)/(quantity in base year)]

What is a chain base index number?

The annual percentage change in quantity, value or price of an item. It is found by using the previous year as the base year.

What is a trend line?

A line of best fit through moving averages

How would you describe a trend line?

As increased or decreasing, not as positive or negative

What is the average seasonal variation?

The mean average difference between the trend line and actual value for a given season

How can one predict values using a trend line?

Read the value from the trend line for the season wanted and add/subtract the average seasonal variation

Why might one not want to predict a value from a scatter graph or trend line?

If the correlation is not strong enough or if the prediction lies outside the range of data (extrapolation)

Mutually Exclusive

Two events that cannot happen at the same time.

Independent events

Two events that have no impact on one another (one happening doesn't affect the probability of the other)

Exhaustive

A set of events that covers all possibilities

What do the probabilities of mutually exclusive exhaustive events sum to?

1

When can we add probabilities?

When they are mutually exclusive.

When can we multiply probabilities?

When they are independent.

What might we use to find probabilities of two events following one another?

A tree diagram.

