Notes on Sampling Methods and Display of Data (Stratified, Cluster, Systematic; Categorical & Quantitative; Titanic Example; Dot/Stem/Histogram/Scatter)

Sampling Methods

Stratified Random Sampling

  • Divide population into non-overlapping groups (strata) that are expected to respond similarly.
  • Randomly sample from each stratum and combine sub-samples to ensure representation.

Cluster (Cluster Random) Sampling

  • Divide population into non-overlapping groups (clusters) that are near one another.
  • Randomly select whole clusters and survey every individual within the selected clusters.

Systematic Random Sampling

  • Order the population.
  • Select every k-th individual.

Chapter 2: Displaying Categorical Data

Pie Chart

  • Circle representing 100% of data; sectors sized by frequency/relative frequency for quick overview of proportions.

Bar Chart

  • X-axis for categorical variable; Y-axis for frequency/count. Used for visual comparison of frequencies.
Excel steps for charts
  • Select data, Insert → 2-D Pie/Bar Chart.

Frequency Table and Relative Frequency Table

  • Frequency table: Counts of responses per category.
  • Relative frequency table: Each category's count divided by total (proportion/percentage).
    • Formula: \text{Relative frequency}_i = \frac{f_i}{N} (where f_i is category frequency, N is total).
Excel Tips for Frequency and Relative Frequency
  • Use SUM for totals; link cells for dynamic updates.

Categorical vs Quantitative Variables

  • Categorical (qualitative): Labels without mathematical meaning (e.g., ZIP code, city name).
  • Quantitative: Measurements that can be averaged (e.g., price, distance).

Explanatory vs Response Variables

  • Explanatory (independent): Predicts the response.
  • Response (dependent): Outcome of interest.
Titanic Two-Way Table Example
  • Analyzed Class (explanatory) vs Survival (response).
  • Example calculation: P(\text{survive } | \text{ first class}) = \frac{197}{319} \approx 0.618 .

Displaying Quantitative Data

Dot Plot

  • Each data value is a dot above its value on the x-axis; multiple occurrences stack.
  • Shows mode, symmetry, or skewness.

Stem-and-Leaf Plot (Stem Plot)

  • Organizes data by place value (stem/leaves).
  • Reveals distribution, peak, and potential outliers.

Histogram

  • Displays distribution using contiguous intervals (bins) on x-axis and counts/frequencies on y-axis.
  • Excel is useful for large datasets; bin choices affect visual results.

Scatter Plot (Numerical vs Numerical)

  • Plots two quantitative variables (explanatory vs response) as points.
  • Shows relationship: direction (positive/negative), form (linear/curved), strength, and outliers.

Practical Takeaways and Connections

  • Data collection methods (sampling) influence data display.
  • Choose display method based on data type: categorical for pie/bar charts, quantitative for dot/stem/histograms; two quantitative for scatter plots.
  • Key formulas include relative frequency and conditional probabilities (e.g., Titanic survival rates).