Notes on Sampling Methods and Display of Data (Stratified, Cluster, Systematic; Categorical & Quantitative; Titanic Example; Dot/Stem/Histogram/Scatter)
Sampling Methods
Stratified Random Sampling
- Divide population into non-overlapping groups (strata) that are expected to respond similarly.
- Randomly sample from each stratum and combine sub-samples to ensure representation.
Cluster (Cluster Random) Sampling
- Divide population into non-overlapping groups (clusters) that are near one another.
- Randomly select whole clusters and survey every individual within the selected clusters.
Systematic Random Sampling
- Order the population.
- Select every k-th individual.
Chapter 2: Displaying Categorical Data
Pie Chart
- Circle representing 100% of data; sectors sized by frequency/relative frequency for quick overview of proportions.
Bar Chart
- X-axis for categorical variable; Y-axis for frequency/count. Used for visual comparison of frequencies.
Excel steps for charts
- Select data, Insert → 2-D Pie/Bar Chart.
Frequency Table and Relative Frequency Table
- Frequency table: Counts of responses per category.
- Relative frequency table: Each category's count divided by total (proportion/percentage).
- Formula: \text{Relative frequency}_i = \frac{f_i}{N} (where f_i is category frequency, N is total).
Excel Tips for Frequency and Relative Frequency
- Use
SUM for totals; link cells for dynamic updates.
Categorical vs Quantitative Variables
- Categorical (qualitative): Labels without mathematical meaning (e.g., ZIP code, city name).
- Quantitative: Measurements that can be averaged (e.g., price, distance).
Explanatory vs Response Variables
- Explanatory (independent): Predicts the response.
- Response (dependent): Outcome of interest.
Titanic Two-Way Table Example
- Analyzed Class (explanatory) vs Survival (response).
- Example calculation: P(\text{survive } | \text{ first class}) = \frac{197}{319} \approx 0.618 .
Displaying Quantitative Data
Dot Plot
- Each data value is a dot above its value on the x-axis; multiple occurrences stack.
- Shows mode, symmetry, or skewness.
Stem-and-Leaf Plot (Stem Plot)
- Organizes data by place value (stem/leaves).
- Reveals distribution, peak, and potential outliers.
Histogram
- Displays distribution using contiguous intervals (bins) on x-axis and counts/frequencies on y-axis.
- Excel is useful for large datasets; bin choices affect visual results.
Scatter Plot (Numerical vs Numerical)
- Plots two quantitative variables (explanatory vs response) as points.
- Shows relationship: direction (positive/negative), form (linear/curved), strength, and outliers.
Practical Takeaways and Connections
- Data collection methods (sampling) influence data display.
- Choose display method based on data type: categorical for pie/bar charts, quantitative for dot/stem/histograms; two quantitative for scatter plots.
- Key formulas include relative frequency and conditional probabilities (e.g., Titanic survival rates).