Intro to Bivariate data
Measures of Central Tendency, Variability, and Spread
Measures of central tendency, variability, and spread provide key insights about the distribution of a single variable.
In many studies, multiple variables are collected for each individual, such as:
Health Studies: Age, sex, height, weight, blood pressure, total cholesterol.
Economic Studies: Personal income, years of education.
This chapter focuses on bivariate data, which includes two quantitative variables for each individual.
Summarizing Bivariate Data
The primary concern is how to summarize bivariate data effectively, paralleling the methods used for single-variable data.
Example of Age and Marriage:
To investigate whether people marry others of similar age, couples' ages are analyzed.
A dataset contains the ages of 10 married couples, demonstrating that husbands are typically older than wives, which aligns with common experiences.
Extending this analysis to a larger dataset of 282 pairs of spousal ages shows that the data is too extensive for direct viewing as a simple table.
Visualization of Bivariate Data
To summarize the pairs of ages of husbands and wives effectively:
Each age variable can be expressed through a histogram, mean, and standard deviation;
However, this does not reveal the relationship or the means conditioned on one variable (e.g., mean age of husbands with 45-year-old wives).
A scatter plot is recommended for visualizing the paired ages:
X-Axis: Age of the husband
Y-Axis: Age of the wife
Each point on the scatter plot represents one married couple.
Interpretation of Scatter Plots
The scatter plot illustrates that there is a strong relationship between the ages of husbands and wives:
Positive Association: As the age of the husband increases, the age of the wife also tends to increase.
This indicates a positive association between the two variables.
Negative Association: If y decreases as x increases, a negative association is established.
Linear Relationship: When points cluster along a straight line, it indicates a linear relationship between the two variables.
Example of a Linear Scatter Plot:
A second scatter plot demonstrates a relationship between arm strength and grip strength for 149 individuals in physically demanding jobs (e.g., electricians, maintenance workers, auto mechanics).
Results indicate a positive association: stronger grip correlates with stronger arm strength.
Characteristics of Scatter Plots
Points in a scatter plot may cluster tightly along a line or more loosely:
Scatter plot of spousal ages depicts tighter clustering compared to arm strength and grip strength.
Not all scatter plots exhibit linear relationships:
Galileo's Experiment: Results of projectile motion (balls rolled down an incline); data does not cluster along a straight line.
In this case, all points are above the line connecting the lowest and highest observed points.
Differences in scatter plots:
Vary by the slope of the line about which data points cluster.
Vary by how closely points cluster around the line.
Statistical Measures of Relationship Strength
A specific statistical measure evaluates the strength of the relationship between variables, considering:
The slope of the line.
The clustering of points around the line.
This measure will be addressed in subsequent sections, providing further depth to the study of bivariate data.