Definition: Data that describes qualities or characteristics.
Not numerical.
Often collected using words or categories.
Examples:
Eye colour (blue, green, brown)
Favourite subject (Maths, English)
Type of cuisine (Italian, Indian, Mexican)
Definition: Data that involves numbers and quantities.
Can be measured or counted.
You can usually do calculations with this data (e.g. mean, median).
Examples:
Height in cm (170 cm)
Number of siblings (3)
Test scores (85%).
Can only take on certain values, usually whole numbers.
There are gaps between the values – data is countable.
Examples:
Number of pets (can be 1, 2, 3… but not 2.5)
Shoe size (often considered discrete even if decimals are used)
Number of cars in a household
Can take any value within a range.
No gaps – data is measurable and can have decimals or fractions.
Examples:
Height (e.g. 165.2 cm)
Weight (e.g. 57.8 kg)
Time taken to finish a race (e.g. 12.34 seconds)
Primary vs Secondary Data
Collected first-hand by the person or group doing the investigation.
More reliable and specific to the purpose of the research.
Takes more time and effort to gather.
Examples:
Conducting a survey yourself
Measuring students' heights in your class
Interviewing people on the street
Data that has been collected by someone else, used for a different purpose.
Quicker and easier to access, but might be less specific or outdated.
Examples:
Information from newspapers or websites
Government statistics (e.g. census data)
Textbooks, articles, or reports
A good questionnaire collects useful and accurate data. Here are key points to consider:
Use clear and simple language.
Ask specific questions (not vague or general).
Provide suitable answer options, especially for multiple choice.
Include response intervals that are non-overlapping and cover all possibilities.
Avoid leading or biased questions.
Overlapping intervals (e.g., 0–10, 10–20 – what about 10?)
Leading questions (e.g., “Do you agree that school lunches are unhealthy?”)
Too personal or sensitive questions without reason.
No option for “Other” or “Prefer not to say”
You can’t always collect data from the whole population, so you collect a sample.
Everyone in the population has an equal chance of being chosen.
Unbiased, good for general results.
Needs a complete list of the population.
Example: Pick 10 students using a random number generator.
The population is divided into groups (strata), then a sample is taken in proportion to the size of each group.
Choose every nth person from a list.
Example: Every 5th person on a class register.
Choose people who are easy to access (e.g., people in the street).
Not very reliable or representative, but easy and quick.
Bias means the data collected doesn't fairly represent the population.
Leading questions (e.g., “Why do you prefer…”)
Only sampling a specific group (e.g., asking only your friends)
Not using a random method
Poorly worded questionnaires
Avoiding bias makes the data more accurate and trustworthy.
Data is consistent and repeatable.
If someone else collected the data the same way, they'd get similar results.
Example: Measuring something with the same method and getting similar outcomes.
Data is relevant and measures what it’s supposed to.
It’s useful for answering the actual question you're investigating.
Example: If you're studying sleep patterns but ask only about caffeine intake, your data might not be valid.
Used to show discrete data.
Each bar represents a category.
Bars are separate (with gaps).
Height of the bar = frequency.
Example: Number of pets students have.
Represents data as proportions of a circle (360°).
Each sector shows a fraction/percentage of the total.
Good for comparing parts to a whole.
Used to show changes over time (time series data).
Points are plotted and connected with lines.
Helpful for spotting trends and patterns.
Example: Temperature over a week.
Uses pictures or symbols to represent frequency.
Each symbol represents a certain number of items.
A key must be included.
Example: Number of books read by students, using 📚 to represent 5 books.
Frequency Polygons
Plotted using the midpoints of class intervals.
Useful for comparing two sets of data.
Plotted like a line graph but represents grouped data.
Steps:
Find midpoints of each class.
Plot midpoint vs frequency.
Join with straight lines.
Organizes small sets of data.
Keeps original data values visible.
Data is split into stem (tens) and leaf (units).
Can be used to find:
Median
Mode
Range
Used for grouped continuous data.
No gaps between bars.
Area of bar = frequency, so:
Height = frequency density
Used when class intervals vary in width.
Used to estimate medians, quartiles, and percentiles.
Plot upper class boundary against cumulative frequency.
Draw a smooth curve or a step graph.
From the graph, you can find:
Median (50%)
Lower Quartile (25%)
Upper Quartile (75%)
Shows 5 key values:
Minimum
Lower quartile (Q1)
Median (Q2)
Upper quartile (Q3)
Maximum
Good for:
Comparing data distributions
Showing spread and skewness
IQR = Q3 - Q1
Add up all values, then divide by the number of values.
The middle value when the data is in order.
If there's an even number of values, take the mean of the two middle numbers.
Example:
Data: 2, 4, 6, 8, 10 → Median = 6
Data: 1, 3, 5, 7 → Median = (3+5)/2 = 4
The value that appears most often.
There can be no mode, one mode, or more than one mode (bimodal).
Example: Data: 3, 4, 4, 5, 6 → Mode = 4
The difference between the highest and lowest values.
Range=Largest value−Smallest value
Example: Data: 2, 5, 7, 9 → Range = 9 – 2 = 7
Measures the spread of the middle 50% of the data.
IQR = Upper Quartile (Q3) - Lower Quartile (Q1)
Q1 = 25% mark
Q2 = Median
Q3 = 75% mark
Use IQR to identify how spread out the middle part of data is (less affected by outliers).
When data is grouped into intervals, the exact values aren’t known, so we estimate the mean using midpoints.
Steps:
Find the midpoint of each class.
Multiply midpoint × frequency.
Add all these results.
Divide by total frequency.
Outliers are values that are much higher or lower than the rest of the data.
Outliers can affect:
The mean (pull it toward the extreme)
The range (increase it)
Box plots and cumulative frequency graphs are useful for spotting them.
To compare two sets of data, look at:
Mean: shows the overall average.
Median: shows the middle value (useful if data has outliers).
Mode: shows the most common value.
Range: how spread out the data is.
Interquartile Range (IQR): spread of the middle 50% – less affected by outliers.
Example:
Two classes take a maths test.
Class A: Mean = 65, IQR = 10
Class B: Mean = 70, IQR = 25
Class B has a higher average, but more variation in results.
Class A has more consistent scores.
Interpreting Box Plots
Box plots help compare:
Median (line inside the box)
IQR (width of the box)
Range (distance from lowest to highest)
Skewness (based on symmetry)
How to compare using box plots:
Higher median → generally better performance
Smaller IQR → more consistent results
Outliers can indicate unusual values
Example:
If Box Plot A has a higher median and smaller IQR than Box Plot B, A’s data is better and more consistent.
Interpreting Cumulative Frequency Graphs
You can use cumulative frequency graphs to compare:
Median (50% mark on y-axis)
Lower and Upper Quartiles
IQR
Maximum value
Example:
Two classes' test results:
Class A's curve is steeper and reaches 100 lower → scores are more consistent
Class B has a wider spread → scores vary more
When comparing two data sets:
Use median or mean to compare typical values.
Use IQR or range to compare consistency or variability.
Use mode if you're comparing most common categories (e.g., most common score or shoe size).
Example Structure for Comparison Answer in Exam:
"Class A has a higher median score than Class B, so they performed better overall."
"However, Class B has a smaller IQR, so their results were more consistent."