1/41
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is bivariate data?
Data where each observation has two paired values, e.g. (height, weight) for each person in a sample.
What is "random-on-non-random" bivariate data? (Case A)
Data where one variable is controlled by the experimenter (not random) and the other is measured (random). E.g. (weight, extension) for a spring with chosen weights.
What is "random-on-random" bivariate data? (Case B)
Data where both variables are random — e.g. (height, weight) for a sample of people. Each value of one has a whole distribution of values of the other.
Which case allows a Pearson PMCC hypothesis test?
Only random-on-random. The PMCC test assumes a bivariate Normal distribution, which can't apply if one variable is not random.
Which case has TWO regression lines?
Random-on-random (Case B): both y on x and x on y exist. Random-on-non-random has just one (the dependent random variable regressed on the controlled one).
Which axis convention for a scatter diagram in Case A?
The independent (controlled) variable goes on the horizontal axis; the dependent (measured) variable goes on the vertical axis.
What does a scatter diagram help you check before testing for correlation?
Whether the data cloud looks roughly elliptical (suggesting bivariate Normality) and whether the relationship looks linear; also helps spot outliers by eye.
What is Pearson's product moment correlation coefficient (PMCC)?
A number r between −1 and 1 measuring how close the data points lie to a straight line. r = ±1 means perfect linear; r = 0 means no linear relationship.
PMCC sample value notation
r (sample), ρ (population).
What does r = 1 mean for a scatter diagram?
All points lie exactly on a straight line with positive gradient — perfect positive linear correlation.
What does r = −1 mean?
All points lie exactly on a straight line with negative gradient — perfect negative linear correlation.
What does r = 0 mean?
No linear relationship between the two variables in the sample (though there could still be a non-linear relationship).
When is it appropriate to do a PMCC hypothesis test?
When both variables are random AND the data can be assumed to come from a bivariate Normal distribution (data cloud approximately elliptical, neither distribution heavily skewed or bimodal).
When is a PMCC test NOT appropriate?
When the data is random-on-non-random, when one or both variables are heavily skewed/bimodal, or when the scatter shows a non-linear pattern.
Null and alternative hypotheses for a PMCC test
H₀: ρ = 0 (no correlation in the population); H₁: ρ ≠ 0 (two-tailed), ρ > 0 or ρ < 0 (one-tailed).
How do you conduct a PMCC hypothesis test?
Calculate r from the sample, compare |r| (or signed r) to the critical value from tables for the given n and significance level; reject H₀ if r is in the critical region.
How should a PMCC test conclusion be worded?
Non-assertively, in context. E.g. "There is sufficient evidence to suggest there is positive correlation between … and …". Avoid "proves" or "accepts".
What is PMCC as an "effect size"?
The value of r itself measured how strong the linear relationship is; large samples can give "significant" but trivially small r — effect size considers practical importance.
Cohen's guideline for PMCC effect sizes (will be given if needed)
Small ≈ 0.1, medium ≈ 0.3, large ≈ 0.5 (you don't need to memorise this; rules are given when needed).
What is Spearman's rank correlation coefficient rₛ?
A correlation coefficient calculated on the RANKS of the data, between −1 and 1. Measures monotonic (not strictly linear) association.
When is Spearman's rₛ more appropriate than Pearson's r?
When you can't assume bivariate Normality, when the relationship looks monotonic but not linear, or when the data is naturally ordinal (ranks).
Hypotheses for a Spearman's test
H₀: no association in the population; H₁: there is association (two-tail) or positive/negative association (one-tail).
Does Spearman's test require distributional assumptions?
No — that's its advantage. It's a non-parametric test, requiring no assumption about the underlying distribution.
What information is lost when using ranks instead of raw values?
The actual size of the gaps between data points; only the order is preserved.
When is Spearman's test inappropriate?
When the scatter diagram shows no evidence of a monotonic relationship (i.e. one variable doesn't tend to consistently increase or decrease as the other increases).
Are tied ranks examinable in Y432?
No — tied ranks are excluded.
How do you decide between Pearson's r and Spearman's rₛ?
Use Pearson if data is bivariate Normal and the relationship looks linear. Use Spearman if Normality is doubtful or the relationship is monotonic but non-linear. Use neither if the relationship clearly isn't monotonic.
What is a least-squares regression line?
A line fitted to bivariate data that minimises the sum of squared vertical distances (residuals) from the points to the line.
Random-on-non-random: which regression line do you use?
The y on x line, where y is the random (measured) variable and x is the controlled variable.
What is a residual?
Residual = observed y-value − predicted y-value from the regression line.
How are residuals used to check a model informally?
If residuals are small, roughly random in sign and have no pattern, the linear model fits well. Systematic patterns suggest a poor fit.
When is it valid to use the regression line for prediction?
Within the range of the data (interpolation). Extrapolating beyond the data is unreliable.
Why is extrapolation risky?
The linear relationship may not hold outside the observed range; the model is fitted only to the data you have.
Random-on-random: how many regression lines exist?
Two — the y on x line (estimating E(Y | X = x)) and the x on y line (estimating E(X | Y = y)).
What point do BOTH regression lines pass through?
The sample mean point (x̄, ȳ).
Which regression line do you use to predict y from x?
The y on x line.
Which regression line do you use to predict x from y?
The x on y line.
What does (PMCC)² approximately measure?
The proportion of variation in one variable that can be explained by a linear relationship with the other.
Which calculator function gives PMCC and regression lines?
The statistics regression mode (a + bx form), entering paired data and reading off r, a, and b.
How are summary statistics like Sxx, Syy, Sxy used?
They go into the formulae for PMCC (r = Sxy / √(Sxx·Syy)) and regression line coefficients (b = Sxy / Sxx). Formulae will be given.
When PMCC and Spearman give different results, what does that suggest?
Often that the relationship is monotonic but not linear, or that outliers are pulling Pearson's r towards a different value than Spearman's rₛ.
How should a regression-line conclusion be expressed in context?
Refer to the predicted variable and the value being used, e.g. "When x = 12, the predicted extension is 4.5 cm" — and warn about interpolation/extrapolation.