module four
Bivariate Data
Data consisting of two variables measured on the same experimental unit.
Types of Bivariate Data
Two Qualitative Variables (e.g., hair color and hair type). 2. One Qualitative and One Quantitative Variable (e.g., gender and height). 3. Two Quantitative Variables (e.g., height and shoe size).
Scatterplots
Used to visualize relationships between two quantitative variables, identifying patterns, trends, and outliers.
Correlation Coefficient (r)
Measures the strength and direction of a linear relationship between two quantitative variables, ranging from -1 to +1.
Perfect Positive Correlation
Represented by r = +1, indicating that as one variable increases, the other also increases.
Perfect Negative Correlation
Represented by r = -1, indicating that as one variable increases, the other decreases.
No Correlation
Indicated by r = 0, meaning no linear relationship between the variables.
Correlation ≠ Causation
A strong correlation between two variables does not imply that one variable causes the other to change.
Third Variable Problem
The influence of a lurking variable that affects both variables in a correlation.
Impact of Outliers
Outliers can distort correlation values and may require justification to remove valid data.
Extrapolation
Making predictions beyond the range of observed data, which can be unreliable.
Nonlinear Relationships
A correlation coefficient of r = 0 does not necessarily mean there is no relationship; the relationship may be nonlinear.
Coefficient of Determination (R²)
Measures the proportion of variance in the dependent variable explained by the independent variable, ranging from 0 to 1.
Linear Regression
Used to predict a value for a dependent variable (y) given a value of an independent variable (x).
Best Fit Line
A line that minimizes the deviations between the line and actual data points in regression analysis.
Slope Interpretation
A positive slope indicates an increase in x leads to an increase in y; a negative slope indicates an increase in x leads to a decrease in y.
Regression vs. Correlation
Correlation measures association (r), while regression predicts a dependent variable (y) from an independent variable (x) using R².
Lawrence Garfinkel
A statistician known for establishing links between smoking and lung cancer through crucial correlation studies.