statistics - collection of data

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/103

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:24 PM on 4/10/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

104 Terms

1
New cards

Raw Data

Unprocessed. Just been collected. Needs to be ordered, grouped, rounded, cleaned.

2
New cards

qualitative data

Non-numerical, descriptive data (e.g. eye colour, gender).

3
New cards

Quantitative

Numerical data that can be measured (e.g. height, weight, exam scores).

4
New cards

discrete data

Data that takes specific values (e.g. number of people, shoe size).

5
New cards

continuous data

Data that can take any value within a range (e.g. height, weight).

6
New cards

categorical data

data that can be sorted into non-overlapping categories such as gender. Used for qualitative data so that it can be more easily processed

7
New cards

Ordinal (rank)

quantitative data that can be given an order or ranked on a rating scale, e.g. marks in an exam.

8
New cards

bivariate data

Involves measuring 2 variables. Can be qualitative or quantitative, grouped or ungrouped. Usually used with scatter diagrams where the two axes represent the two different variables. One variable is often called the explanatory variable and the other the response variable.

9
New cards

Multivariate

Made up of more than 2 variables e.g. comparing height, weight, age and shoe size together.

10
New cards

Grouping Data

Grouping data using tables makes it easier to spot patterns in the data and quickly see how the data is distributed.

11
New cards

Discrete data

  • can be grouped into classes that do not overlap e.g. 0-10, 11-15… (they do not have to have equal class width).

  • Uses smaller intervals when there is a lot of data close together in that range and wider classes for data that is more spread out.

12
New cards

Continuous data

  • can be grouped using inequalities.

  • The class intervals must not have gaps between them or be overlapping so inequality symbols must be used with one of the symbols being < and the other ≤.

13
New cards

pros of grouping data

  • Makes the data easy to read and understand.

  • Easy to spot patterns and compare data.

14
New cards

cons of grouping data

  • Loses accuracy of data as you no longer know exact data values.

  • Calculations made from these will only be an estimate e.g. mean.

15
New cards

Primary

Data that you have collected yourself, or someone has collected on your behalf.

16
New cards

Secondary

Data that has already been collected.

17
New cards

advantages of primary data

  • accurate

  • collection method known

  • can find answers to specific questions

18
New cards

advantages of secondary data

  • cheap

  • easy

  • quick

  • data from some organisations can be more reliable than data collected yourself

19
New cards

disadvantages of primary data

  • time consuming

  • expensive

20
New cards

disadvantages of secondary data

  • method of collection unknown

  • data may be out of date

  • may contain mistakes

  • may come from unreliable source

  • may be difficult to find answers to specific questions

21
New cards

examples of primary data

  • questionnaires

  • interviews

  • experiments

  • observations

22
New cards

examples of secondary data

  • database

  • newspaper/magazines/websites

  • historical records

  • office for national statistics (government website)

23
New cards

Population

Everyone or everything that could be involved in the investigation e.g. when investigating opinions of students in a school the population would be all the students in the school.

24
New cards

Census

A survey of the entire population.

25
New cards

Sample

A smaller number from the population that you actually survey. The data obtained from the sample is then used to make conclusions about the whole population, so it is important that the sample represents the population fairly.

26
New cards

Sampling Frame

A list of all the members of the population. This is where you will choose the sample from. E.g. electoral roll, school register.

27
New cards

Sampling Unit

The people that are to be sampled e.g. students in a school.

28
New cards

Biased sample

a sample that does not represent the population fairly. Example, if surveying students at a mixed school and the sample only contains girls. Avoid bias by using random sampling methods.

29
New cards
30
New cards

advantages of census

  • unbiased

  • accurate

  • takes into account entire population

31
New cards

advantages of sample

  • cheaper

  • quicker

  • less data to consider

32
New cards

disadvantages of census

  • time consuming

  • expensive

  • lots of data to manage

  • difficult to ensure whole population is used

33
New cards

disadvantages of sample

  • may be biased

  • not completely representative

34
New cards

sampling methods - completely random

simple random sampling

35
New cards

sampling method - includes some random sampling

  • stratified sampling

  • systematic sampling

  • cluster sampling

36
New cards

sampling methods - non-random

  • quota sampling

  • opportunity sampling

  • judgement sampling

37
New cards

random sample

Every item/person in the population has an equal chance of being selected.

38
New cards

random sample method

1) find the expected number of coin flips/dice rolls/etc

  • probability x total survey

2) find how many answered “yes” honestly

  • subtract (1) from how many put the indicated answer

3) find the proportion of those who answered “yes” that were answering honestly

  • honest “yes” answers / (divided) excepted coin flips

39
New cards

Random Sampling Techniques

  • Pick numbers/names out of a hat (only works for small samples)

  • Using a random number table

  • Using the random number generator function on a calculator or computer.

40
New cards

advantages of random sampling

  • Sample is representative as every member of the population has an equal chance of being selected.

  • Unbiased

41
New cards

disadvantages of random sampling

  • Need a full list of population (not always easily obtainable)

  • Not always convenient as it can be expensive and time consuming.

  • Needs a large sample size

42
New cards

stratified sample

the size of each strata (group) in the sample is in proportion to the sizes of strata in the population. E.g. if group A accounts for 10% of the population, in the sample group A will also be 10% of the sample size.

43
New cards

stratified sample method

  • Split the population into groups (usually done for you in the exam)

  • Use the formula 𝒔𝒕𝒓𝒂𝒕𝒊𝒇𝒊𝒆𝒅 𝒔𝒂𝒎𝒑𝒍𝒆 = 𝒔𝒕𝒓𝒂𝒕𝒂/𝒕𝒐𝒕𝒂𝒍 × 𝒔𝒂𝒎𝒑𝒍𝒆 𝒔𝒊𝒛𝒆 to calculate sample size for each group. (remember to check totals if you rounded numbers and adjust accordingly if your total sample size after stratification is bigger/smaller than sample size in the question)

  • Use random sampling to select members from each strata/group.

44
New cards

pros of Stratified Sample

  • Sample is in proportion to population, so sample represents the population fairly.

  • Best used for populations with groups of unequal sizes.

45
New cards

cons of Stratified Sample

Time consuming

46
New cards

systematic sampling

choosing items in the population at regular intervals.

47
New cards

systematic sampling method (improve later)

  • Divide your population size by sample size to calculate the intervals, e.g. 400/40 = 10 so

choosing every 10th item in the population.

  • Use random sampling to generate a number between 1 and 10 (or the answer to your

calculation from above) to choose a starting point e.g. 7.

  • Select every 10th item after the 7th e.g. 7th, 17th

, 27th, …, until you obtain your sample size.

48
New cards

pros of systematic sampling

  • Population is evenly sampled.

  • Can be carried out by a machine.

  • Sample is easy to select.

49
New cards

cons of systematic sampling

  • Not strictly a random sample as some member of the population cannot be chosen.

50
New cards

quota sampling

Population is grouped by characteristics and a fixed amount is sampled from every group.

51
New cards

quota sampling method

  • Group population by characteristics e.g. gender and age

  • Select quota (amount) for each group e.g. 30 men under 25, 40 women over 30 etc.

  • Obtain sample by finding members of each group until quota is reached.

52
New cards

pros of quota sampling

  • Quick to use.

  • Cheap.

  • Do not need sample frame or full list of the population.

53
New cards

cons of quota sampling

  • NOT RANDOM – biased as interviewer is choosing who will be in the sample so every member of the population does not have an equal chance of being selected.

54
New cards

opportunity sampling

Using the people/items that are available at the time. E.g. interviewing the first 10 people you see on a Monday morning.

55
New cards

pros of opportunity sampling

  • Quick

  • Cheap

  • Easy

56
New cards

cons of opportunity sampling

  • NOT RANDOM. The sample has not been collected fairly so it may not represent the population and every member of the population has not been given an equal chance to be selected.

57
New cards

judgement sampling

When the researcher uses their own judgement to select a sample, they think will represent the population. E.g. A teacher choosing students to interview about their opinion on a new after school club.

58
New cards

pros of judgement sampling

  • easy

  • quick

59
New cards

cons of judgement sampling

  • NOT RANDOM.

  • Quality of sample depends on the person selecting the sample. The researcher may be

biased and unreliable in the sample they select.

60
New cards

Petersen capture-recapture

  • Used to estimate the size of large or moving populations where it would be impossible to count the entire population. Your answer is only an ESTIMATE

61
New cards

petersen capture-recapture equation

62
New cards

Petersen capture-recapture method

  1. Take a sample of the population

2. Mark each item

3. Put the items back into the population and ensure they are thoroughly mixed

4. Take a second sample and count how many of your sample are marked

5. The proportion of marked items in your new sample should be the same as the proportion of marked items from the population in your first sample.

63
New cards

petersen capture-recapture assumptions

  • Population has not changed – no births/deaths

  • Probability of being caught is equally likely for all individuals.

  • Marks/tags not lost

  • Sample size is large enough and is representative of the population

64
New cards

experiments

used when a researcher in how changes in one variable affect another.

65
New cards

experiment variables: Explanatory (Independent) Variable

The variable that is changed.

66
New cards

experiment variables: Response (dependent) variable

the variable that is measured

67
New cards

experiment variables: Extraneous Variables

Variables you are not interested in but that could affect the result of your experiment.

68
New cards

Laboratory Experiments

Researcher has full control over variables. Conducted in a lab or similar environment.

69
New cards

examples of laboratory experiments

measuring reaction times of people of different ages.

  • Explanatory variable - age

  • Response variable - reaction time.

  • Extraneous variables - gender, health condition, fitness level etc.

70
New cards

pros of laboratory experiments

  • Easy to replicate – makes results more reliable.

  • Extraneous variables can be controlled so results are more likely to be valid as you can be sure that other factors are not affecting your results.

71
New cards

cons of laboratory experiments

People may behave differently under test conditions than they would under real-life conditions – could affect validity of results.

72
New cards

field experiments

Carried out in the everyday environment. Researcher has some control over the variables. They set up the situation and controls the explanatory variable but has less control over extraneous variables.

73
New cards

example of field experiments

testing new methods of revision.

  • Explanatory variable – method of revision

  • Response variable – results in exam

  • Extraneous variables – amount of revision pupils does, ability of pupils.

74
New cards

pros of field experiments

  • more accurate - reflects real life behaviour

75
New cards

cons of field experiments

  • cannot control extraneous variables

  • not as easy to replicate - less reliable than lab experiments

76
New cards

natural experiments

Carried out in the everyday environment. Researcher has no/very little control over the variables. Explanatory variables are not changed but instead researchers look at something that already exists in the world and how it affects other things.

77
New cards

examples of natural experiments

the effect of education on level of income

Explanatory variable – level of education

Response variable – income

Extraneous variables – IQ, other skills people may have, personal circumstances

78
New cards

pros of natural experiments

  • reflects real life behaviour

79
New cards

cons of natural experiments

  • Low validity – extraneous variables are not controlled which may affect results instead of explanatory variable.

  • Difficult to replicate.

  • Cannot control extraneous variables.

80
New cards

simulation

A way to model random events using random numbers and previously collected data. These could be used to help you predict what could actually happen in real life. Easier and cheaper than actually collecting the data.

81
New cards

simulation steps

1. Choose a suitable method for getting random numbers – dice, calculator, random number tables.

2. Assign numbers to the data.

3. Generate the random numbers.

4. Match the random numbers to your outcomes.

82
New cards

simulation example

You sell milk, dark and white chocolates in a shop. P(milk) = 3/6, P(white) = 1/6, P(dark) = 2/6. Simulate the choice of chocolates that the next 10 customers will buy. We are not looking at theoretical probability for each chocolate otherwise we could just work out 3/6 of 10 and so on. We are using these to assign numbers to generate random numbers from that will tell us which chocolate each customer will choose. So, a bit more like experimental probability/relative frequency without the real-life situation.

1. Use a dice as there are 6 numbers in this scenario.

2. 3/6 of 6 is 3 so assign numbers 1, 2, 3 on the dice to milk chocolate. 1/6 of 6 is 1 so assign the next number, 4, to white chocolate. Assign numbers 5 and 6 on the dice to dark chocolate.

3. Roll the dice 10 times to generate the random numbers and record the results. E.g. 3,3,4,5,1,5,1,3,5,2.

4. Match the numbers to the outcomes – M, M, W, D, M, D, M, D, M. You now know for the next ten customers you need 6 milk chocolates, 1 white chocolate and 3 dark chocolates.

Note that these results do not match with the probabilities in the question and they won’t always as this is mimicking real life situations. Also remember that since this is a simulation these results are not necessarily accurate. To get a more reliable simulation repeat the simulation lots of times.

83
New cards

questionnaires/interviews

a source of primary data

84
New cards

questionnaires

A set of questions used to obtain data from the population/sample. Can be carried out via post, email, phone or face to face. The person completing the questionnaire is called the respondent. Questions can be open or closed.

85
New cards

open questions - questionnaire

Allows any answer. However, the wide range of different answers makes it difficult to analyse the data.

86
New cards

closed question - questionnaires

Has a fixed number of non-overlapping option boxes that only allow for specific answers or opinion scales. This makes data easier to analyse.

87
New cards

features of a good questionnaire

  • Easy to understand

  • Uses simple language

  • Avoid leading questions such as “do you agree…?” – makes the respondent want to agree.

  • Questions are relevant to the investigation

  • Includes a time frame/unit in the question.

  • Includes non-overlapping, exhaustive option boxes.

  • Questions should not be offensive/personal/embarrassing

  • Questions which are easy to analyse the results.

88
New cards

problems with questionnaires - non-response

when people in the sample do not respond to the questionnaire. Could be due to people not wanting to answer the questionnaire or not understanding the questions.

o Follow up on people who have not responded.

o Collect each questionnaire yourself.

o Offer an incentive to complete the questionnaire such as the opportunity to win a prize.

o Use a pilot survey to test response rate or understandability of questions.

89
New cards

problems with questionnaires - sensitive questions

Includes questions about people’s health, age, weight, salary etc. May make people uncomfortable so they may not answer truthfully which could distort the results. You can make respondents more comfortable by making the questionnaire anonymous and allowing them to answer the questionnaire in private or by using the random response method.

90
New cards

random response method

Uses a random event to decide how to answer a question which ensures that people who answer the question remain anonymous. You can use the survey results to calculate an estimate for the proportion of people who answered yes to the sensitive question.

91
New cards

random response method steps

1. Find total who answered questions.

2. Find prob. (heads) if it is a coin.

3. Estimate no. of heads – Prob x total

4. Estimate number of “yes” answers that were truthful;

Yes answer – estimated no of heads

5. Estimate proportion of people who did the crime = D/C

92
New cards

pilot study

A small scale replica of the study to be carried out. Used to test the design and methods of the questionnaire

93
New cards

pros of pilot study

Helps you spot any questions that are unclear or ambiguous.

Gives you an idea of the response rate

Allows you to check the time and costs of the study.

You can check that closed questions include all the possible answers.

Can use pilot study to check that the questionnaire collects all the information needed.

94
New cards

interviews

where you question each person individually. Involves lots of specific questions or a list of topics. Can be carried out face to face or over the phone or internet.

95
New cards

interview advantages

  • interviewer can explain

  • interviewer can put people at ease when having to answer personal qs

  • respondents can explain their answers

  • high response rate

96
New cards

anonymous questionnaires advantages

  • respondents more likely to answer personal questions

  • no interviewer bias

  • easy to send questionnaires to large sample size

  • quick

  • cheap

97
New cards

interview disadvantages

  • less likely to answer personal questions and may be less honest

  • time consuming

  • expensive

  • smaller sample size than questionnaire

  • interviewer bias - interviewer may interpret answers to suit their opinion

  • respondent may try to impress/guess the answer the interviewer wants

98
New cards

anonymous questionnaire disadvantages

  • some questions may not be understood

  • researchers may not understand some of the responses

  • low response rate

99
New cards

problems with collected data - outliers

values that do not fit in with the pattern or trend of the data. Can be extreme values or incorrectly recorded. If incorrectly recorded, these can be ignored. If extreme values, you need to decide whether or not to include them in the data as they may distort/skew your results

100
New cards

problems with collected data - cleaning data

fixing problems with the data. This could be done by:

Identifying and correcting/removing incorrect data values or outliers.

Removing units or symbols from the data,

Putting all the data in the same format e.g. m/cm, capital/lowercase, words/letters.

Deciding what to do about missing data.