Observing the relationship between two numerical variables (e.g., height and weight).
Aim is to understand how one numerical variable responds to changes in another.
A variable that changes in response to the independent variable.
Denoted as y.
A variable used to explain changes in the dependent variable.
Denoted as x.
Acts independently to cause differences in the response variable y.
Effect of rainfall on crop yield.
X = amount of rainfall (independent variable).
Y = crop yield (dependent variable).
Effect of midterm score on final grade.
X = midterm score (independent variable).
Y = final grade (dependent variable).
Data for two numerical variables should be recorded as pairs (X, Y).
Use scatter plots to visualize these bivariate observations:
X-axis: Independent variable (x).
Y-axis: Dependent variable (y).
Plot data points based on bivariate observations (e.g., (x1, y1), (x2, y2)).
Example: Does schooling affect salary?
X = years of schooling.
Y = salary.
Scatter plots show relationships and can indicate differences among age groups by using different symbols for data points.
Positive Association: as X increases, Y also increases.
Negative Association: as X increases, Y decreases.
Linear: points follow a straight line.
Curvilinear: points follow a curved line.
Clustered data: points are loose and hard to identify a trend.
Strong linear relationship: data points closely align with a linear trend.
Moderate linear relationship: data points are somewhat clustered around a trend line.
Weak relationship: data points are scattered with no clear trend.
Observations that deviate significantly from overall pattern.
Could mislead interpretations of the relationship.
Measures strength and direction of linear relationships between two numerical variables.
Denoted by r (or R).
Ranges from -1 to 1:
r = 1: Perfect positive linear correlation.
r = -1: Perfect negative linear correlation.
r = 0: No linear correlation.
Calculated using means and standard deviations: sensitive to outliers.
Indicates positive association, where increases in X lead to increases in Y.
Example: Years of schooling (X) and salary (Y) have r = 0.9941, indicating strong positive linear relationship.
Indicates negative association, where increases in X lead to decreases in Y.
Symmetrical: Switching X and Y does not change the r value.
Dimensionless: Has no units, purely a numerical signifier.
Assess strength using the absolute value of r:
Close to 1 indicates strong relation, close to 0 indicates weak relation.
Correlation does not imply causation:
Correlation can exist due to lurking variables.
Lurking Variables: Hidden influences that affect both x and y.
Example: Association between years of schooling and salary does not imply causation (factors like experience, company size affect salary).
Correct conclusions require controlling for all lurking variables.
Need scatter plot to confirm linearity before using correlation coefficient.
Strong positive correlation noted between years of schooling and salary.
Important to refrain from assuming causation based solely on correlation.