1/103
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Raw Data
Unprocessed. Just been collected. Needs to be ordered, grouped, rounded, cleaned.
qualitative data
Non-numerical, descriptive data (e.g. eye colour, gender).
Quantitative
Numerical data that can be measured (e.g. height, weight, exam scores).
discrete data
Data that takes specific values (e.g. number of people, shoe size).
continuous data
Data that can take any value within a range (e.g. height, weight).
categorical data
data that can be sorted into non-overlapping categories such as gender. Used for qualitative data so that it can be more easily processed
Ordinal (rank)
quantitative data that can be given an order or ranked on a rating scale, e.g. marks in an exam.
bivariate data
Involves measuring 2 variables. Can be qualitative or quantitative, grouped or ungrouped. Usually used with scatter diagrams where the two axes represent the two different variables. One variable is often called the explanatory variable and the other the response variable.
Multivariate
Made up of more than 2 variables e.g. comparing height, weight, age and shoe size together.
Grouping Data
Grouping data using tables makes it easier to spot patterns in the data and quickly see how the data is distributed.
Discrete data
can be grouped into classes that do not overlap e.g. 0-10, 11-15… (they do not have to have equal class width).
Uses smaller intervals when there is a lot of data close together in that range and wider classes for data that is more spread out.
Continuous data
can be grouped using inequalities.
The class intervals must not have gaps between them or be overlapping so inequality symbols must be used with one of the symbols being < and the other ≤.
pros of grouping data
Makes the data easy to read and understand.
Easy to spot patterns and compare data.
cons of grouping data
Loses accuracy of data as you no longer know exact data values.
Calculations made from these will only be an estimate e.g. mean.
Primary
Data that you have collected yourself, or someone has collected on your behalf.
Secondary
Data that has already been collected.
advantages of primary data
accurate
collection method known
can find answers to specific questions
advantages of secondary data
cheap
easy
quick
data from some organisations can be more reliable than data collected yourself
disadvantages of primary data
time consuming
expensive
disadvantages of secondary data
method of collection unknown
data may be out of date
may contain mistakes
may come from unreliable source
may be difficult to find answers to specific questions
examples of primary data
questionnaires
interviews
experiments
observations
examples of secondary data
database
newspaper/magazines/websites
historical records
office for national statistics (government website)
Population
Everyone or everything that could be involved in the investigation e.g. when investigating opinions of students in a school the population would be all the students in the school.
Census
A survey of the entire population.
Sample
A smaller number from the population that you actually survey. The data obtained from the sample is then used to make conclusions about the whole population, so it is important that the sample represents the population fairly.
Sampling Frame
A list of all the members of the population. This is where you will choose the sample from. E.g. electoral roll, school register.
Sampling Unit
The people that are to be sampled e.g. students in a school.
Biased sample
a sample that does not represent the population fairly. Example, if surveying students at a mixed school and the sample only contains girls. Avoid bias by using random sampling methods.
advantages of census
unbiased
accurate
takes into account entire population
advantages of sample
cheaper
quicker
less data to consider
disadvantages of census
time consuming
expensive
lots of data to manage
difficult to ensure whole population is used
disadvantages of sample
may be biased
not completely representative
sampling methods - completely random
simple random sampling
sampling method - includes some random sampling
stratified sampling
systematic sampling
cluster sampling
sampling methods - non-random
quota sampling
opportunity sampling
judgement sampling
random sample
Every item/person in the population has an equal chance of being selected.
random sample method
1) find the expected number of coin flips/dice rolls/etc
probability x total survey
2) find how many answered “yes” honestly
subtract (1) from how many put the indicated answer
3) find the proportion of those who answered “yes” that were answering honestly
honest “yes” answers / (divided) excepted coin flips
Random Sampling Techniques
Pick numbers/names out of a hat (only works for small samples)
Using a random number table
Using the random number generator function on a calculator or computer.
advantages of random sampling
Sample is representative as every member of the population has an equal chance of being selected.
Unbiased
disadvantages of random sampling
Need a full list of population (not always easily obtainable)
Not always convenient as it can be expensive and time consuming.
Needs a large sample size
stratified sample
the size of each strata (group) in the sample is in proportion to the sizes of strata in the population. E.g. if group A accounts for 10% of the population, in the sample group A will also be 10% of the sample size.
stratified sample method
Split the population into groups (usually done for you in the exam)
Use the formula 𝒔𝒕𝒓𝒂𝒕𝒊𝒇𝒊𝒆𝒅 𝒔𝒂𝒎𝒑𝒍𝒆 = 𝒔𝒕𝒓𝒂𝒕𝒂/𝒕𝒐𝒕𝒂𝒍 × 𝒔𝒂𝒎𝒑𝒍𝒆 𝒔𝒊𝒛𝒆 to calculate sample size for each group. (remember to check totals if you rounded numbers and adjust accordingly if your total sample size after stratification is bigger/smaller than sample size in the question)
Use random sampling to select members from each strata/group.
pros of Stratified Sample
Sample is in proportion to population, so sample represents the population fairly.
Best used for populations with groups of unequal sizes.
cons of Stratified Sample
Time consuming
systematic sampling
choosing items in the population at regular intervals.
systematic sampling method (improve later)
Divide your population size by sample size to calculate the intervals, e.g. 400/40 = 10 so
choosing every 10th item in the population.
Use random sampling to generate a number between 1 and 10 (or the answer to your
calculation from above) to choose a starting point e.g. 7.
Select every 10th item after the 7th e.g. 7th, 17th
, 27th, …, until you obtain your sample size.
pros of systematic sampling
Population is evenly sampled.
Can be carried out by a machine.
Sample is easy to select.
cons of systematic sampling
Not strictly a random sample as some member of the population cannot be chosen.
quota sampling
Population is grouped by characteristics and a fixed amount is sampled from every group.
quota sampling method
Group population by characteristics e.g. gender and age
Select quota (amount) for each group e.g. 30 men under 25, 40 women over 30 etc.
Obtain sample by finding members of each group until quota is reached.
pros of quota sampling
Quick to use.
Cheap.
Do not need sample frame or full list of the population.
cons of quota sampling
NOT RANDOM – biased as interviewer is choosing who will be in the sample so every member of the population does not have an equal chance of being selected.
opportunity sampling
Using the people/items that are available at the time. E.g. interviewing the first 10 people you see on a Monday morning.
pros of opportunity sampling
Quick
Cheap
Easy
cons of opportunity sampling
NOT RANDOM. The sample has not been collected fairly so it may not represent the population and every member of the population has not been given an equal chance to be selected.
judgement sampling
When the researcher uses their own judgement to select a sample, they think will represent the population. E.g. A teacher choosing students to interview about their opinion on a new after school club.
pros of judgement sampling
easy
quick
cons of judgement sampling
NOT RANDOM.
Quality of sample depends on the person selecting the sample. The researcher may be
biased and unreliable in the sample they select.
Petersen capture-recapture
Used to estimate the size of large or moving populations where it would be impossible to count the entire population. Your answer is only an ESTIMATE
petersen capture-recapture equation
Petersen capture-recapture method
Take a sample of the population
2. Mark each item
3. Put the items back into the population and ensure they are thoroughly mixed
4. Take a second sample and count how many of your sample are marked
5. The proportion of marked items in your new sample should be the same as the proportion of marked items from the population in your first sample.
petersen capture-recapture assumptions
Population has not changed – no births/deaths
Probability of being caught is equally likely for all individuals.
Marks/tags not lost
Sample size is large enough and is representative of the population
experiments
used when a researcher in how changes in one variable affect another.
experiment variables: Explanatory (Independent) Variable
The variable that is changed.
experiment variables: Response (dependent) variable
the variable that is measured
experiment variables: Extraneous Variables
Variables you are not interested in but that could affect the result of your experiment.
Laboratory Experiments
Researcher has full control over variables. Conducted in a lab or similar environment.
examples of laboratory experiments
measuring reaction times of people of different ages.
Explanatory variable - age
Response variable - reaction time.
Extraneous variables - gender, health condition, fitness level etc.
pros of laboratory experiments
Easy to replicate – makes results more reliable.
Extraneous variables can be controlled so results are more likely to be valid as you can be sure that other factors are not affecting your results.
cons of laboratory experiments
People may behave differently under test conditions than they would under real-life conditions – could affect validity of results.
field experiments
Carried out in the everyday environment. Researcher has some control over the variables. They set up the situation and controls the explanatory variable but has less control over extraneous variables.
example of field experiments
testing new methods of revision.
Explanatory variable – method of revision
Response variable – results in exam
Extraneous variables – amount of revision pupils does, ability of pupils.
pros of field experiments
more accurate - reflects real life behaviour
cons of field experiments
cannot control extraneous variables
not as easy to replicate - less reliable than lab experiments
natural experiments
Carried out in the everyday environment. Researcher has no/very little control over the variables. Explanatory variables are not changed but instead researchers look at something that already exists in the world and how it affects other things.
examples of natural experiments
the effect of education on level of income
Explanatory variable – level of education
Response variable – income
Extraneous variables – IQ, other skills people may have, personal circumstances
pros of natural experiments
reflects real life behaviour
cons of natural experiments
Low validity – extraneous variables are not controlled which may affect results instead of explanatory variable.
Difficult to replicate.
Cannot control extraneous variables.
simulation
A way to model random events using random numbers and previously collected data. These could be used to help you predict what could actually happen in real life. Easier and cheaper than actually collecting the data.
simulation steps
1. Choose a suitable method for getting random numbers – dice, calculator, random number tables.
2. Assign numbers to the data.
3. Generate the random numbers.
4. Match the random numbers to your outcomes.
simulation example
You sell milk, dark and white chocolates in a shop. P(milk) = 3/6, P(white) = 1/6, P(dark) = 2/6. Simulate the choice of chocolates that the next 10 customers will buy. We are not looking at theoretical probability for each chocolate otherwise we could just work out 3/6 of 10 and so on. We are using these to assign numbers to generate random numbers from that will tell us which chocolate each customer will choose. So, a bit more like experimental probability/relative frequency without the real-life situation.
1. Use a dice as there are 6 numbers in this scenario.
2. 3/6 of 6 is 3 so assign numbers 1, 2, 3 on the dice to milk chocolate. 1/6 of 6 is 1 so assign the next number, 4, to white chocolate. Assign numbers 5 and 6 on the dice to dark chocolate.
3. Roll the dice 10 times to generate the random numbers and record the results. E.g. 3,3,4,5,1,5,1,3,5,2.
4. Match the numbers to the outcomes – M, M, W, D, M, D, M, D, M. You now know for the next ten customers you need 6 milk chocolates, 1 white chocolate and 3 dark chocolates.
Note that these results do not match with the probabilities in the question and they won’t always as this is mimicking real life situations. Also remember that since this is a simulation these results are not necessarily accurate. To get a more reliable simulation repeat the simulation lots of times.
questionnaires/interviews
a source of primary data
questionnaires
A set of questions used to obtain data from the population/sample. Can be carried out via post, email, phone or face to face. The person completing the questionnaire is called the respondent. Questions can be open or closed.
open questions - questionnaire
Allows any answer. However, the wide range of different answers makes it difficult to analyse the data.
closed question - questionnaires
Has a fixed number of non-overlapping option boxes that only allow for specific answers or opinion scales. This makes data easier to analyse.
features of a good questionnaire
Easy to understand
Uses simple language
Avoid leading questions such as “do you agree…?” – makes the respondent want to agree.
Questions are relevant to the investigation
Includes a time frame/unit in the question.
Includes non-overlapping, exhaustive option boxes.
Questions should not be offensive/personal/embarrassing
Questions which are easy to analyse the results.
problems with questionnaires - non-response
when people in the sample do not respond to the questionnaire. Could be due to people not wanting to answer the questionnaire or not understanding the questions.
o Follow up on people who have not responded.
o Collect each questionnaire yourself.
o Offer an incentive to complete the questionnaire such as the opportunity to win a prize.
o Use a pilot survey to test response rate or understandability of questions.
problems with questionnaires - sensitive questions
Includes questions about people’s health, age, weight, salary etc. May make people uncomfortable so they may not answer truthfully which could distort the results. You can make respondents more comfortable by making the questionnaire anonymous and allowing them to answer the questionnaire in private or by using the random response method.
random response method
Uses a random event to decide how to answer a question which ensures that people who answer the question remain anonymous. You can use the survey results to calculate an estimate for the proportion of people who answered yes to the sensitive question.
random response method steps
1. Find total who answered questions.
2. Find prob. (heads) if it is a coin.
3. Estimate no. of heads – Prob x total
4. Estimate number of “yes” answers that were truthful;
Yes answer – estimated no of heads
5. Estimate proportion of people who did the crime = D/C
pilot study
A small scale replica of the study to be carried out. Used to test the design and methods of the questionnaire
pros of pilot study
Helps you spot any questions that are unclear or ambiguous.
Gives you an idea of the response rate
Allows you to check the time and costs of the study.
You can check that closed questions include all the possible answers.
Can use pilot study to check that the questionnaire collects all the information needed.
interviews
where you question each person individually. Involves lots of specific questions or a list of topics. Can be carried out face to face or over the phone or internet.
interview advantages
interviewer can explain
interviewer can put people at ease when having to answer personal qs
respondents can explain their answers
high response rate
anonymous questionnaires advantages
respondents more likely to answer personal questions
no interviewer bias
easy to send questionnaires to large sample size
quick
cheap
interview disadvantages
less likely to answer personal questions and may be less honest
time consuming
expensive
smaller sample size than questionnaire
interviewer bias - interviewer may interpret answers to suit their opinion
respondent may try to impress/guess the answer the interviewer wants
anonymous questionnaire disadvantages
some questions may not be understood
researchers may not understand some of the responses
low response rate
problems with collected data - outliers
values that do not fit in with the pattern or trend of the data. Can be extreme values or incorrectly recorded. If incorrectly recorded, these can be ignored. If extreme values, you need to decide whether or not to include them in the data as they may distort/skew your results
problems with collected data - cleaning data
fixing problems with the data. This could be done by:
Identifying and correcting/removing incorrect data values or outliers.
Removing units or symbols from the data,
Putting all the data in the same format e.g. m/cm, capital/lowercase, words/letters.
Deciding what to do about missing data.