Scatter Plots and Correlation Notes Scatter Plots and Correlation Objective 1: Drawing Scatter Plots A scatter plot is a graph of ordered pairs used to determine if a relationship exists between two variables. Types of linear relationships:Positive linear relationship: As one variable increases, the other variable also tends to increase. Negative linear relationship: As one variable increases, the other variable tends to decrease. No linear relationship: There is no apparent pattern between the two variables. No relationship: There isn't a discernible connection between the variables. Example: Age and Systolic Blood Pressure Data:Age (x): 43, 48, 56, 61, 67, 70 Blood Pressure (y): 128, 120, 135, 143, 141, 152 Task: Draw a scatter plot for the data and determine if there appears to be a linear relationship. Assessment: Determine if the relationship is positive or negative. Objective 2: Computing the Correlation Coefficient Population Correlation Coefficient: Denoted by the Greek letter ρ (rho). Computed using all possible pairs of data values from a population: ( x , y ) (x, y) ( x , y ) Sample Correlation Coefficient: Symbolized by r r r . Measures the strength and direction of a linear relationship between two quantitative variables. Formula for Sample Correlation Coefficient (r): r = n ∑ x y − ( ∑ x ) ( ∑ y ) [ n ∑ x 2 − ( ∑ x ) 2 ] [ n ∑ y 2 − ( ∑ y ) 2 ] r = \frac{n \sum xy - (\sum x)(\sum y)}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}} r = [ n ∑ x 2 − ( ∑ x ) 2 ] [ n ∑ y 2 − ( ∑ y ) 2 ] n ∑ x y − ( ∑ x ) ( ∑ y ) Where n n n is the number of data pairs. Example: Compute the Correlation Coefficient for Age and Systolic Blood Pressure Data:Age (x): 43, 48, 56, 61, 67, 70 Blood Pressure (y): 128, 120, 135, 143, 141, 152 x 2 x^2 x 2 : 1849, 2304, 3136, 3721, 4489, 4900y 2 y^2 y 2 : 16384, 14400, 18225, 20449, 19881, 23104x y xy x y : 5504, 5760, 7560, 8723, 9447, 10640 Properties of the Linear Correlation Coefficient The value of r r r will always be between − 1 -1 − 1 and 1 1 1 inclusively: − 1 ≤ r ≤ 1 -1 \le r \le 1 − 1 ≤ r ≤ 1 The closer r r r is to − 1 -1 − 1 or 1 1 1 , the stronger the linear association between the variables. The closer r r r is to 0 0 0 , the weaker the linear association between the variables. If the values of x x x and y y y are interchanged, the value of r r r will not change. Objective 3: Hypothesis Testing for Correlation Null Hypothesis (H 0 H_0 H 0 ): H 0 : ρ = 0 H_0: \rho = 0 H 0 : ρ = 0 (There is no linear correlation between the two variables.)Alternative Hypothesis (H 1 H_1 H 1 ): H 1 : ρ ≠ 0 H_1: \rho \neq 0 H 1 : ρ = 0 (There is a linear correlation between the two variables.)Test Value (t-statistic): t = r 1 − r 2 n − 2 t = \frac{r}{\sqrt{\frac{1 - r^2}{n - 2}}} t = n − 2 1 − r 2 r P-value: Degrees of freedom: d f = n − 2 df = n - 2 df = n − 2 Example: Hypothesis Test for Age and Systolic Blood Pressure Using a significance level of 5% (α = 0.05 \alpha = 0.05 α = 0.05 ), test if there is a linear correlation between age and systolic blood pressure. Steps:State the null and alternative hypotheses. Calculate the test statistic ( t ) (t) ( t ) . Determine the degrees of freedom ( d f ) (df) ( df ) . Find the p-value. Make a decision based on the p-value and significance level. Draw a conclusion. Example: Driver Age and Accidents Data represents ages of drivers and the number of accidents reported for each age group in Pennsylvania for a selected year.Age (x): 16, 17, 18, 19, 20, 21 Number of accidents (y): 6605, 8932, 8506, 7349, 6458, 5974 Tasks:Draw a scatter plot. Compute the correlation coefficient (round to three decimal places). Test the claim that there is a linear correlation between ages of drivers and the number of accidents reported using α = 10 % \alpha = 10\% α = 10% . Example: Driver Age and Accidents (Excluding 16-Year-Olds) Calculate the correlation coefficient without the ordered pair ( 16 , 6605 ) (16, 6605) ( 16 , 6605 ) .Age (x): 17, 18, 19, 20, 21 Number of accidents (y): 8932, 8506, 7349, 6458, 5974 x 2 x^2 x 2 : 289, 324, 361, 400, 441y 2 y^2 y 2 : 79780624, 72352036, 54007801, 41705764, 35688676x y xy x y : 151844, 153108, 139631, 129160, 125454 Determine if there is a linear correlation between the ages of drivers and the number of accidents reported when 16-year-olds are left out using α = 10 % \alpha = 10\% α = 10% .Follow the same hypothesis testing steps as before.