Intro to Bivariate data

Measures of Central Tendency, Variability, and Spread

  • Measures of central tendency, variability, and spread provide key insights about the distribution of a single variable.

  • In many studies, multiple variables are collected for each individual, such as:

    • Health Studies: Age, sex, height, weight, blood pressure, total cholesterol.

    • Economic Studies: Personal income, years of education.

  • This chapter focuses on bivariate data, which includes two quantitative variables for each individual.

Summarizing Bivariate Data

  • The primary concern is how to summarize bivariate data effectively, paralleling the methods used for single-variable data.

  • Example of Age and Marriage:

    • To investigate whether people marry others of similar age, couples' ages are analyzed.

    • A dataset contains the ages of 10 married couples, demonstrating that husbands are typically older than wives, which aligns with common experiences.

    • Extending this analysis to a larger dataset of 282 pairs of spousal ages shows that the data is too extensive for direct viewing as a simple table.

Visualization of Bivariate Data

  • To summarize the pairs of ages of husbands and wives effectively:

    • Each age variable can be expressed through a histogram, mean, and standard deviation;

    • However, this does not reveal the relationship or the means conditioned on one variable (e.g., mean age of husbands with 45-year-old wives).

  • A scatter plot is recommended for visualizing the paired ages:

    • X-Axis: Age of the husband

    • Y-Axis: Age of the wife

    • Each point on the scatter plot represents one married couple.

Interpretation of Scatter Plots

  • The scatter plot illustrates that there is a strong relationship between the ages of husbands and wives:

    • Positive Association: As the age of the husband increases, the age of the wife also tends to increase.

    • This indicates a positive association between the two variables.

  • Negative Association: If y decreases as x increases, a negative association is established.

  • Linear Relationship: When points cluster along a straight line, it indicates a linear relationship between the two variables.

  • Example of a Linear Scatter Plot:

    • A second scatter plot demonstrates a relationship between arm strength and grip strength for 149 individuals in physically demanding jobs (e.g., electricians, maintenance workers, auto mechanics).

    • Results indicate a positive association: stronger grip correlates with stronger arm strength.

Characteristics of Scatter Plots

  • Points in a scatter plot may cluster tightly along a line or more loosely:

    • Scatter plot of spousal ages depicts tighter clustering compared to arm strength and grip strength.

    • Not all scatter plots exhibit linear relationships:

    • Galileo's Experiment: Results of projectile motion (balls rolled down an incline); data does not cluster along a straight line.

    • In this case, all points are above the line connecting the lowest and highest observed points.

  • Differences in scatter plots:

    • Vary by the slope of the line about which data points cluster.

    • Vary by how closely points cluster around the line.

Statistical Measures of Relationship Strength

  • A specific statistical measure evaluates the strength of the relationship between variables, considering:

    • The slope of the line.

    • The clustering of points around the line.

  • This measure will be addressed in subsequent sections, providing further depth to the study of bivariate data.