1/41
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
what are Scatter diagrams used for
bivariate data to show if there is a relationship between two variables. you have to plot the points with crosses and do not join them up
explanatory variables - scatter diagrams
(independent – the one that you are changing) is plotted on the x-axis
Response Variable - scatter diagram
(dependent – the one you are measuring) is plotted on the y-axis
what is correlation
the relationship between two variables
positive correlation
As one variable increases, so does the other
Negative Correlation
As on variable increase, the other decreases
Zero Correlation
The points are randomly scattered
linear correlation
When the points lie close together near a straight line
non-linear correlation
When the points lie close together but the patter formed by them is a curve
causation - casual relationships
When one variable causes a change in another. Correlation shows that there may be a link between two variables. Correlation does not imply causation.
Example: Causal Relationship – increase in temperature = Increase in ice cream sales
Correlation only – Sales of chocolate and sales of clothes having a positive correlation
Multiple Factors – casual relationships
In real life situations there are usually multiple factors interacting to cause variables to change.
Example: A positive correlation between fat in liver and reaction time does not mean one causes the other. There could be a third variable, such as amount of alcohol consumed, which both variables depend on
LOBF
A straight line drawn through the middle of the points so the points are evenly scattered on either side of the line.
Needs to be a straight line.
Needs to be close to as many points as possible.
Has to go through the mean point.
The closer the points are to the LOBF, the stronger the correlation.
Interpolation and Extrapolation
Using the LOBF to make predictions of unknown values
interpolation
When the LOBF is used to make predictions within the range of data given (you don’t need to extend you LOBF more).
Tends to be reliable provided the LOBF is correct
Extrapolation
When the LOBF is used to predict values outside of the range of values given (you may need to extend your LOBF for this).
Not always reliable as trends may change.
Values estimated from extrapolation are less reliable the further they are from the range of data
LOBF equation
y=mx+c
m (y=mx+c)
the gradient
c (y=mx+c)
the y-intercept
interpreting gradient (m)
as the x variable increases by 1, y variable increases/decreases by m
m is the rate of change of y compared to x
interpreting y-intercept ( c )
the value of y when x is zero is c (y-intercept)
drawing LOBF
equation is given e.g y = 10 + 2x
find two points on x-axis e.g 50 (on the left) and 70 (on the right)
use equation to work out y axis e.g y = 10 + 2 × 50 and y = 10 + 2 × 70
plot the x,y coordinates
Spearman’s Rank Correlation Coefficient (SRCC), rs
Measures the strength of the correlation between 2 variables. SRCC is always between -1 and 1.
The closer the value is to 0, the weaker the correlation.
The further the value is from 0, the stronger the correlation
If rs near 1, there is a strong positive correlation
If rs = 0, there is zero correlation
If rs near -1, there is a strong negative correlation.
SRCC equation
1- 6 sum d²/n(n²-1)
d = difference between ranks
n = number of values
Calculating SRCC:
1. Rank both sets of data (largest to smallest)
2. Find the difference between each pair of ranks
3. Square the differences
4. Add the square of differences
5. Find the value of n – count the number of pairs of data.
6. Substitute into the formula – remember the 1 at the beginning.
7. Interpret your SRCC value in terms of correlation and strength of correlation – make this in context of the question.
Pearson’s Product Moment Correlation Coefficient, PMCC
Measures the strength of linear correlation between two variables.
PMCC also between -1 and 1.
It is calculated using actual data values and not ranks so can be used for data that can’t be ranked – don’t worry you won’t have to calculate PMCC.
If r near 1, there is a strong positive correlation
If r = 0, there is zero correlation
If r near -1, there is a strong negative correlation.
SRCC
Measures the strength of correlation between 2 - variables
Have correlation between -1 and 1
Tests for linear and non-linear correlation
Best used for data that can be ranked
PMCC
Measures the strength of correlation between 2 variables
Have correlation between -1 and 1
Tests for linear correlation only
Can be used for data that can’t be ranked as well
SRCC vs PMCC
If there is a non-linear positive relationship between 2 variables then the SRCC and PMCC will both be positive but the SRCC will be closer to 1, or -1 for negative relationship.
How do you read a value from a scatter diagram?
Find the point, drop vertically to the x-axis, then go horizontally to the y-axis. Include units.
Exam phrase: When x is …, the corresponding value of y is approximately …
What is the double mean point?
The point (mean of x, mean of y).
Plot it with a cross (X).
How do you find the mean of y if the total is given?
Mean of y = total of y ÷ number of values.
What rules must a Line of Best Fit follow?
It must be straight, pass through the mean point, and have roughly equal points above and below.
How do you draw a regression line from an equation?
Choose two x-values, calculate y for each, plot both points, then draw a straight line through them.
How do you say a scatter diagram supports a hypothesis?
The scatter diagram shows a positive/negative correlation, so this supports the hypothesis.
How do you find the mean of y using a regression line?
Substitute the mean of x into the equation of the line to find the mean of
When is an estimate more reliable?
When it is interpolation, within the data range, and close to the mean and data points.
What is the difference between interpolation and extrapolation?
Interpolation is within the data range and is more reliable.
Extrapolation is outside the range and is less reliable.
Give one limitation of a scatter diagram investigation.
Other variables may affect the results, so correlation does not imply causation
what is the formula for percentage decrease
original new/original x 100
when interpreting scatter diagrams talk about
correlation - positive, negative, none
correlation - strong, weak
relate it to context
m gradient equation
y2 - y1 / x2 - x1
c y-intercept gradient
y - y1 = m ( x - x1)