Bivariate Data

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/41

Earn XP

Description and Tags

Statistics

A-Level Further and Additional Maths

Last updated 2:11 PM on 5/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

42 Terms

New cards

What is bivariate data?

Data where each observation has two paired values, e.g. (height, weight) for each person in a sample.

New cards

What is "random-on-non-random" bivariate data? (Case A)

Data where one variable is controlled by the experimenter (not random) and the other is measured (random). E.g. (weight, extension) for a spring with chosen weights.

New cards

What is "random-on-random" bivariate data? (Case B)

Data where both variables are random — e.g. (height, weight) for a sample of people. Each value of one has a whole distribution of values of the other.

New cards

Which case allows a Pearson PMCC hypothesis test?

Only random-on-random. The PMCC test assumes a bivariate Normal distribution, which can't apply if one variable is not random.

New cards

Which case has TWO regression lines?

Random-on-random (Case B): both y on x and x on y exist. Random-on-non-random has just one (the dependent random variable regressed on the controlled one).

New cards

Which axis convention for a scatter diagram in Case A?

The independent (controlled) variable goes on the horizontal axis; the dependent (measured) variable goes on the vertical axis.

New cards

What does a scatter diagram help you check before testing for correlation?

Whether the data cloud looks roughly elliptical (suggesting bivariate Normality) and whether the relationship looks linear; also helps spot outliers by eye.

New cards

What is Pearson's product moment correlation coefficient (PMCC)?

A number r between −1 and 1 measuring how close the data points lie to a straight line. r = ±1 means perfect linear; r = 0 means no linear relationship.

New cards

PMCC sample value notation

r (sample), ρ (population).

New cards

What does r = 1 mean for a scatter diagram?

All points lie exactly on a straight line with positive gradient — perfect positive linear correlation.

New cards

What does r = −1 mean?

All points lie exactly on a straight line with negative gradient — perfect negative linear correlation.

New cards

What does r = 0 mean?

No linear relationship between the two variables in the sample (though there could still be a non-linear relationship).

New cards

When is it appropriate to do a PMCC hypothesis test?

When both variables are random AND the data can be assumed to come from a bivariate Normal distribution (data cloud approximately elliptical, neither distribution heavily skewed or bimodal).

New cards

When is a PMCC test NOT appropriate?

When the data is random-on-non-random, when one or both variables are heavily skewed/bimodal, or when the scatter shows a non-linear pattern.

New cards

Null and alternative hypotheses for a PMCC test

H₀: ρ = 0 (no correlation in the population); H₁: ρ ≠ 0 (two-tailed), ρ > 0 or ρ < 0 (one-tailed).

New cards

How do you conduct a PMCC hypothesis test?

Calculate r from the sample, compare |r| (or signed r) to the critical value from tables for the given n and significance level; reject H₀ if r is in the critical region.

New cards

How should a PMCC test conclusion be worded?

Non-assertively, in context. E.g. "There is sufficient evidence to suggest there is positive correlation between … and …". Avoid "proves" or "accepts".

New cards

What is PMCC as an "effect size"?

The value of r itself measured how strong the linear relationship is; large samples can give "significant" but trivially small r — effect size considers practical importance.

New cards

Cohen's guideline for PMCC effect sizes (will be given if needed)

Small ≈ 0.1, medium ≈ 0.3, large ≈ 0.5 (you don't need to memorise this; rules are given when needed).

New cards

What is Spearman's rank correlation coefficient rₛ?

A correlation coefficient calculated on the RANKS of the data, between −1 and 1. Measures monotonic (not strictly linear) association.

New cards

When is Spearman's rₛ more appropriate than Pearson's r?

When you can't assume bivariate Normality, when the relationship looks monotonic but not linear, or when the data is naturally ordinal (ranks).

New cards

Hypotheses for a Spearman's test

H₀: no association in the population; H₁: there is association (two-tail) or positive/negative association (one-tail).

New cards

Does Spearman's test require distributional assumptions?

No — that's its advantage. It's a non-parametric test, requiring no assumption about the underlying distribution.

New cards

What information is lost when using ranks instead of raw values?

The actual size of the gaps between data points; only the order is preserved.

New cards

When is Spearman's test inappropriate?

When the scatter diagram shows no evidence of a monotonic relationship (i.e. one variable doesn't tend to consistently increase or decrease as the other increases).

New cards

Are tied ranks examinable in Y432?

No — tied ranks are excluded.

New cards

How do you decide between Pearson's r and Spearman's rₛ?

Use Pearson if data is bivariate Normal and the relationship looks linear. Use Spearman if Normality is doubtful or the relationship is monotonic but non-linear. Use neither if the relationship clearly isn't monotonic.

New cards

What is a least-squares regression line?

A line fitted to bivariate data that minimises the sum of squared vertical distances (residuals) from the points to the line.

New cards

Random-on-non-random: which regression line do you use?

The y on x line, where y is the random (measured) variable and x is the controlled variable.

New cards

What is a residual?

Residual = observed y-value − predicted y-value from the regression line.

New cards

How are residuals used to check a model informally?

If residuals are small, roughly random in sign and have no pattern, the linear model fits well. Systematic patterns suggest a poor fit.

New cards

When is it valid to use the regression line for prediction?

Within the range of the data (interpolation). Extrapolating beyond the data is unreliable.

New cards

Why is extrapolation risky?

The linear relationship may not hold outside the observed range; the model is fitted only to the data you have.

New cards

Random-on-random: how many regression lines exist?

Two — the y on x line (estimating E(Y | X = x)) and the x on y line (estimating E(X | Y = y)).

New cards

What point do BOTH regression lines pass through?

The sample mean point (x̄, ȳ).

New cards

Which regression line do you use to predict y from x?

The y on x line.

New cards

Which regression line do you use to predict x from y?

The x on y line.

New cards

What does (PMCC)² approximately measure?

The proportion of variation in one variable that can be explained by a linear relationship with the other.

New cards

Which calculator function gives PMCC and regression lines?

The statistics regression mode (a + bx form), entering paired data and reading off r, a, and b.

New cards

How are summary statistics like Sxx, Syy, Sxy used?

They go into the formulae for PMCC (r = Sxy / √(Sxx·Syy)) and regression line coefficients (b = Sxy / Sxx). Formulae will be given.

New cards

When PMCC and Spearman give different results, what does that suggest?

Often that the relationship is monotonic but not linear, or that outliers are pulling Pearson's r towards a different value than Spearman's rₛ.

New cards

How should a regression-line conclusion be expressed in context?

Refer to the predicted variable and the value being used, e.g. "When x = 12, the predicted extension is 4.5 cm" — and warn about interpolation/extrapolation.