AQA GSCE: Section B - Statistics

Types of Data

Qualitative vs Quantitative Data

Qualitative Data (also called categorical data):
  • Definition: Data that describes qualities or characteristics.

  • Not numerical.

  • Often collected using words or categories.

Examples:

  • Eye colour (blue, green, brown)

  • Favourite subject (Maths, English)

  • Type of cuisine (Italian, Indian, Mexican)

Quantitative Data:
  • Definition: Data that involves numbers and quantities.

  • Can be measured or counted.

  • You can usually do calculations with this data (e.g. mean, median).

Examples:

  • Height in cm (170 cm)

  • Number of siblings (3)

  • Test scores (85%).

Discrete vs Continuous Data (both are types of quantitative data)

Discrete Data:
  • Can only take on certain values, usually whole numbers.

  • There are gaps between the values – data is countable.

Examples:

  • Number of pets (can be 1, 2, 3… but not 2.5)

  • Shoe size (often considered discrete even if decimals are used)

  • Number of cars in a household

Continuous Data:
  • Can take any value within a range.

  • No gaps – data is measurable and can have decimals or fractions.

Examples:

  • Height (e.g. 165.2 cm)

  • Weight (e.g. 57.8 kg)

  • Time taken to finish a race (e.g. 12.34 seconds)

Primary vs Secondary Data

Primary Data:
  • Collected first-hand by the person or group doing the investigation.

  • More reliable and specific to the purpose of the research.

  • Takes more time and effort to gather.

Examples:

  • Conducting a survey yourself

  • Measuring students' heights in your class

  • Interviewing people on the street

Secondary Data:
  • Data that has been collected by someone else, used for a different purpose.

  • Quicker and easier to access, but might be less specific or outdated.

Examples:

  • Information from newspapers or websites

  • Government statistics (e.g. census data)

  • Textbooks, articles, or reports

Data Collection

Designing a Questionnaire

A good questionnaire collects useful and accurate data. Here are key points to consider:

Good Practice:
  • Use clear and simple language.

  • Ask specific questions (not vague or general).

  • Provide suitable answer options, especially for multiple choice.

  • Include response intervals that are non-overlapping and cover all possibilities.

  • Avoid leading or biased questions.

Common Mistakes:
  • Overlapping intervals (e.g., 0–10, 10–20 – what about 10?)

  • Leading questions (e.g., “Do you agree that school lunches are unhealthy?”)

  • Too personal or sensitive questions without reason.

  • No option for “Other” or “Prefer not to say”

Sampling Methods

You can’t always collect data from the whole population, so you collect a sample.

1. Random Sampling
  • Everyone in the population has an equal chance of being chosen.

  • Unbiased, good for general results.

  • Needs a complete list of the population.

Example: Pick 10 students using a random number generator.

2. Stratified Sampling
  • The population is divided into groups (strata), then a sample is taken in proportion to the size of each group.

3. Systematic Sampling
  • Choose every nth person from a list.

Example: Every 5th person on a class register.

4. Convenience/Opportunity Sampling
  • Choose people who are easy to access (e.g., people in the street).

Not very reliable or representative, but easy and quick.

Bias in Data Collection

Bias means the data collected doesn't fairly represent the population.

Causes of Bias:
  • Leading questions (e.g., “Why do you prefer…”)

  • Only sampling a specific group (e.g., asking only your friends)

  • Not using a random method

  • Poorly worded questionnaires

Avoiding bias makes the data more accurate and trustworthy.

Data Reliability and Validity

Reliability:
  • Data is consistent and repeatable.

  • If someone else collected the data the same way, they'd get similar results.

Example: Measuring something with the same method and getting similar outcomes.

Validity:
  • Data is relevant and measures what it’s supposed to.

  • It’s useful for answering the actual question you're investigating.

Example: If you're studying sleep patterns but ask only about caffeine intake, your data might not be valid.

Representing Data: Charts and Graphs

Bar Charts

  • Used to show discrete data.

  • Each bar represents a category.

  • Bars are separate (with gaps).

  • Height of the bar = frequency.

Example: Number of pets students have.

Pie Charts

  • Represents data as proportions of a circle (360°).

  • Each sector shows a fraction/percentage of the total.

  • Good for comparing parts to a whole.

Line Graphs

  • Used to show changes over time (time series data).

  • Points are plotted and connected with lines.

  • Helpful for spotting trends and patterns.

Example: Temperature over a week.

Pictograms

  • Uses pictures or symbols to represent frequency.

  • Each symbol represents a certain number of items.

  • A key must be included.

Example: Number of books read by students, using 📚 to represent 5 books. 

Frequency Polygons

  • Plotted using the midpoints of class intervals.

  • Useful for comparing two sets of data.

  • Plotted like a line graph but represents grouped data.

Steps:

  1. Find midpoints of each class.

  2. Plot midpoint vs frequency.

  3. Join with straight lines.

Stem-and-Leaf Diagrams

  • Organizes small sets of data.

  • Keeps original data values visible.

  • Data is split into stem (tens) and leaf (units).

  • Can be used to find:

    • Median

    • Mode

    • Range

Histograms (with unequal class intervals)

  • Used for grouped continuous data.

  • No gaps between bars.

  • Area of bar = frequency, so:

    • Height = frequency density

Used when class intervals vary in width.

Cumulative Frequency Graphs

  • Used to estimate medians, quartiles, and percentiles.

  • Plot upper class boundary against cumulative frequency.

  • Draw a smooth curve or a step graph.

From the graph, you can find:

  • Median (50%)

  • Lower Quartile (25%)

  • Upper Quartile (75%)

Box Plots (Box-and-Whisker Plots)

  • Shows 5 key values:

    • Minimum

    • Lower quartile (Q1)

    • Median (Q2)

    • Upper quartile (Q3)

    • Maximum

Good for:

  • Comparing data distributions

  • Showing spread and skewness

IQR = Q3 - Q1

Averages and Measures of Spread

Mean (Average)

Add up all values, then divide by the number of values.

Median

  • The middle value when the data is in order.

  • If there's an even number of values, take the mean of the two middle numbers.

Example:
Data: 2, 4, 6, 8, 10 → Median = 6
Data: 1, 3, 5, 7 → Median = (3+5)/2 = 4

Mode

  • The value that appears most often.

  • There can be no mode, one mode, or more than one mode (bimodal).

Example: Data: 3, 4, 4, 5, 6 → Mode = 4

Range

  • The difference between the highest and lowest values.

Range=Largest value−Smallest value

Example: Data: 2, 5, 7, 9 → Range = 9 – 2 = 7

Interquartile Range (IQR)

  • Measures the spread of the middle 50% of the data.

    • IQR = Upper Quartile (Q3) - Lower Quartile (Q1)

  • Q1 = 25% mark

  • Q2 = Median

  • Q3 = 75% mark

Use IQR to identify how spread out the middle part of data is (less affected by outliers).

Estimating the Mean from Grouped Data

When data is grouped into intervals, the exact values aren’t known, so we estimate the mean using midpoints.

Steps:

  1. Find the midpoint of each class.

  2. Multiply midpoint × frequency.

  3. Add all these results.

  4. Divide by total frequency.

Identifying Outliers

  • Outliers are values that are much higher or lower than the rest of the data.

Outliers can affect:

  • The mean (pull it toward the extreme)

  • The range (increase it)

Box plots and cumulative frequency graphs are useful for spotting them.

Comparing Data

Comparing Two Data Sets Using Averages and Spread

To compare two sets of data, look at:

Measures of Central Tendency (Averages):
  • Mean: shows the overall average.

  • Median: shows the middle value (useful if data has outliers).

  • Mode: shows the most common value.

Measures of Spread:
  • Range: how spread out the data is.

  • Interquartile Range (IQR): spread of the middle 50% – less affected by outliers.

Example:

Two classes take a maths test.

  • Class A: Mean = 65, IQR = 10

  • Class B: Mean = 70, IQR = 25

Class B has a higher average, but more variation in results.
Class A has more consistent scores.

Interpreting Box Plots

Box plots help compare:

  • Median (line inside the box)

  • IQR (width of the box)

  • Range (distance from lowest to highest)

  • Skewness (based on symmetry)

How to compare using box plots:

  • Higher median → generally better performance

  • Smaller IQR → more consistent results

  • Outliers can indicate unusual values

Example:

If Box Plot A has a higher median and smaller IQR than Box Plot B, A’s data is better and more consistent.

Interpreting Cumulative Frequency Graphs

You can use cumulative frequency graphs to compare:

  • Median (50% mark on y-axis)

  • Lower and Upper Quartiles

  • IQR

  • Maximum value

Example:

Two classes' test results:

  • Class A's curve is steeper and reaches 100 lower → scores are more consistent

  • Class B has a wider spread → scores vary more

Making Comparisons Using Statistical Measures

When comparing two data sets:

  1. Use median or mean to compare typical values.

  2. Use IQR or range to compare consistency or variability.

  3. Use mode if you're comparing most common categories (e.g., most common score or shoe size).

Example Structure for Comparison Answer in Exam:

  • "Class A has a higher median score than Class B, so they performed better overall."

  • "However, Class B has a smaller IQR, so their results were more consistent."