Class Announcements

  • Multiple topics to cover today: quiz, procedures, assignments, etc.
  • Lecture will begin first, followed by discussion of other topics before break.

Statistics Lecture Overview

  • Focus for the week: statistics and calculations.
  • Main topic: Descriptive Statistics
    • Explanation: focuses on calculations rather than making decisions.
    • Important components to be discussed:
    • Measures of central tendency (mean, median, mode)
    • Z score
    • Empirical rule
    • Box plots

Importance of Understanding Samples and Populations

  • Definition of Population: Every single observation, person, or event in a data set.
  • Definition of Sample: A subset of the population.
  • Importance: Identifying whether data is a sample or population is crucial for analysis.
    • Clarity: No tricks will be included regarding this; either words or mathematical notation will be provided.

Measures of Central Tendency

  • Mean:
    • Sample mean notation: $ar{x}$
    • Definition: Average of sample data.
    • Formula: ar{x} = rac{ ext{sum of all } x}{n}
    • Population mean notation: $BC$ (mu)
    • Definition: Average of a population.
    • Formula: BC = rac{ ext{sum of all } x}{N}
  • Median:
    • Definition: Middle value of ordered data.
    • Formula for finding median location: ext{Location} = rac{n + 1}{2}
  • Mode:
    • Definition: Most frequently occurring value in a data set.
    • Characteristics: Can have more than one mode or no mode at all.
    • Instructions: If no mode, specify "no mode" or "none" instead of zero.

Measures of Dispersion

  • Definition: How spread out or dispersed a data set is.
  • Key measures:
    • Range:
    • Calculation: ext{Range} = ext{Largest value} - ext{Smallest value}
    • Variance:
    • Definition: Measure of how much the individual data points differ from the mean.
    • Standard Deviation:
    • Definition: Square root of variance.
    • Formulas:
    • Sample variance notation: $s^2$
    • Population variance notation: $C3^2$ (sigma squared)
    • Sample standard deviation notation: $s$
    • Population standard deviation notation: $C3$

Calculation Procedures

  • Variance formulas:
    • For population: ext{Variance} = rac{ ext{sum of } (x - BC)^2}{N}
    • For sample: ext{Variance} = rac{ ext{sum of } (x - ar{x})^2}{n - 1}
  • Explanation on degrees of freedom: Using $n-1$ in sample variance accounts for sample size limitations.

Box Plots

  • Definition: Visual representation emphasizing five-number summary (minimum, first quartile Q1, median Q2, third quartile Q3, maximum).
  • Quartiles:
    • Quartile 1 (Q1): 25th percentile
    • Quartile 2 (Q2): Median
    • Quartile 3 (Q3): 75th percentile
  • Interquartile Range (IQR): ext{IQR} = Q3 - Q1
  • Outlier Representation: Any data points that fall outside 1.5 times the IQR from Q1 and Q3 are considered outliers.

Z Scores and Normal Distribution

  • Normal Distribution: Bell-shaped curve where 100% of the data lies under the curve.
    • Properties:
    • Symmetrical with mean at the center.
    • Asymptotic: Curve approaches the horizontal axis but never touches it.
  • Z Score Calculation:
    • Formula: Z = rac{(X - ext{mean})}{ ext{standard deviation}}
    • Explanation: Z score represents the number of standard deviations a data point is from the mean.
  • Empirical Rule:
    • Approximately 68% of values lie within one standard deviation of the mean, 95% within two, and 99.7% within three.

Assignments and Projects

  • Instructions on upcoming assignments and expectations.
  • Emphasis on projects utilizing quantitative data, focused on mean and standard deviation, and avoiding qualitative data unless for regression analysis.

Summary & Key Definitions

  • Sample vs. Population
  • Measures of Central Tendency: Mean, Median, Mode
  • Measures of Dispersion: Range, Variance, Standard Deviation
  • Normal Distribution Characteristics
  • Z Score Calculation