All terminology and equations for Edexcel Pearson book AS-level stats Ch4 - correlation.
What is bivariate data?
Data which has pairs of values for two variables
Which variable goes on the x-axis?
independent (explanatory)
Which variable goes on the y-axis?
dependent (response)
What is correlation?
a statistical measure that describes the nature of the linear relationship between two variables
What type of correlation is this?
Strong negative correlation
What type of correlation is this?
Weak negative correlation
What type of correlation is this?
No/zero correlation
What type of correlation is this?
Weak positive correlation
What type of correlation is this?
Strong negative correlation
What is meant by a casual relationship?
if a change in one variable causes a change in the other. However, just because two variables show correlation =! a casual relationship
How to determine if two variables have a casual relationship?
variables must be correlated
use common sense
Name a type of line of best fit
(least squares) regression line, a straight line that minimises the sum of the squares of distances of each data point from the line.
How is the regression of line y on x written?
y = a + bx
What does the coefficient b tell you (for a regression line)
the change in y for each unit change in x.
if data is positively correlated, b is positive
if data is negatively correlated, b is negative
How can you make a prediction/estimate the dependent variable?
use the regression line, only if dependent variable is known.
What range can you make predictions of the dependent variable in?
Within the range of the given data. This is called interpolation.
What is interpolation?
Using the regression line to make predictions of the dependent variable within the range of the given data.
If a regression line of y on x is used to predict a value, which value should you know, and which value should you predict?
x (known)
y (is predicted)
What is extrapolation?
The process of predicting the data (dependent variable) outside the range of given data. It gives a much less reliable estimate.
If a regression line of x on y is used to predict a value, which value should you know, and which value should you predict?
y (known)
x (is predicted)