NA

STAB22 Midterm Review Notes

Major Topics Covered

  • Displaying and Describing Categorical Data
  • Displaying and Summarizing Quantitative Data
  • Empirical Rule; Understanding and Comparing Distributions
  • The Standard Deviation as a Ruler and the Normal Model
  • Scatterplots, Association, and Correlation; Linear Regression; Regression Wisdom

Displaying and Describing Categorical Data

  • Frequency Tables / Relative Frequency Tables: Used for organizing categorical data.
  • Bar Charts:
    • Visually display the distribution of a categorical variable.
    • Represents counts for each category through the height of bars.
    • Best practice: Include spaces between bars for clarity.
    • Area Principle: Area of each bar correlates with the magnitude of the value it represents.
  • Pie Charts:
    • Represent the whole group as a circle, where sizes of slices are proportional to fractions of a whole.
  • Contingency Tables:
    • Show relationships between two categorical variables. Includes marginal, joint, and conditional distributions.

Exam Questions on Categorical Data

  • Oil Spills Example:
    • Analyzed 50 major oil spills.
    • To calculate the percentage caused by collisions:
    • ( ext{Collisions} = 10)
    • ext{Percentage} = rac{10}{50} imes 100 = 20 ext{ ext{%}}
  • Blood Donors Example:
    • Distribution of blood types based on a sample of 25 donors.
    • ext{Type A Percentage} = 20 ext{ ext{%}}
    • Calculate number of donors with blood type A:
    • 20 ext{ ext{% of 25 = 0.2} imes 25 = 5}

Displaying and Summarizing Quantitative Data

  • Distributions of Data: Utilize histograms to visualize data distributions.
  • Central Tendency:
    • Measures include: Mean (average), Median (middle value), Mode (most frequent value).
  • Spread:
    • Includes Range (difference between maximum and minimum), Interquartile Range (IQR), Variance, Standard Deviation.
  • Five-Number Summary:
    • Consists of Minimum, Q1, Median, Q3, Maximum. Used for describing center and spread.
  • Boxplots: Visual displays of the five-number summary; highlight outliers.

Exam Questions on Quantitative Data

  • Data Analysis Insight: Difference between data sets can be analyzed using these measures for comparison.
  • Understanding Outliers: Outliers significantly affect mean and variance; thus, IQR is often preferred as it is resistant to outliers.

The Normal Distribution and the Empirical Rule

  • Z-scores: Standardize data for comparison across different scales.
    • Z-score formula: Z = rac{(X - ext{mean})}{ ext{standard deviation}}
  • Empirical Rule (68-95-99.7 Rule):
    • 68% of values fall within 1 standard deviation from the mean.
    • 95% of values fall within 2 standard deviations from the mean.
    • 99.7% within 3 standard deviations.

Scatterplots, Correlation, and Linear Regression

  • Scatterplots: Visualize relationships between two quantitative variables.
    • Describe form (linear vs non-linear), direction (positive vs negative), strength (tight vs loose clustering).
  • Correlation Coefficient (r): Measures the strength and direction of a linear relationship. Range is -1 ext{ to } 1.
    • Sensitive to outliers; unit-free.
    • Note: correlation does not imply causation.
  • Linear Regression:
    • Line of best fit allows predictions based on relationships inferred from scatterplots.
    • Key equations involve determining the slope and intercept.

Exam Review Example Questions

  • Calculate the Z-scores for comparative analysis (biology vs psychology exam). Example question..
  • Identify the R-squared value for regression models to understand how much variance in the dependent variable is explained by the independent variable(s).