Descriptive Statistics Notes
Chapter Overview
- Descriptive Statistics: Focuses on numerical and graphical methods to describe and display data. Calculating and interpreting measurements and graphs is key.
2.1 Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs
- Stem-and-Leaf Graphs: Useful for small datasets; allows for easy visualization of data distributions.
- Construction: Split observations into a stem (leading digits) and leaf (last digit).
- Example: For the number 23, stem = 2, leaf = 3.
Example Data:
Pre-Calculus Exam Scores: 33, 42, 49, 49, 53, 55, 55, 61, …, 100 (sorted smallest to largest)
For constructing a stem-and-leaf graph, organize the data into stems and leaves:
- Stem | Leaves
0 | 2 4 4 4 4 4 6
1 | 3 5 5 6 7 8 9
2 | 3 4 8 8
3 | 3 5 5 6 7 8
- Stem | Leaves
Line Graphs: Used to represent data over time or other continuous variables.
- Example: Frequency of reminders to do chores by teenagers.
Bar Graphs: Represent categorical data with bars spaced apart.
- Example: Facebook users by age group: 13-25: 45%, 26-44: 36%, 45-64: 19%.
2.2 Histograms, Frequency Polygons, and Time Series Graphs
Histograms: Display frequency of data in contiguous bars, allows for visualization of data shape, center, and spread.
- Usage: Preferable for larger datasets (100+ values).
- Example: Heights of players represented through intervals.
Frequency Polygon: Similar to histograms; connects midpoints of intervals to visualize frequency.
Time Series Graphs: Use to plot data points at successive time intervals.
- Example: CO2 emissions over several years plotted over time.
2.3 Measures of Location of Data
Percentiles and Quartiles:
- Calculation: Data must be ordered.
- Percentiles: 90th percentile indicates 90% of scores are lower.
- Example: Calculate quartiles and percentiles from datasets of various ages or prices.
Interpretation of Quartiles: Q1 (25th percentile), Q2 (median), Q3 (75th percentile).
Interpreting Context: Context of data affects how percentiles are judged as good or bad.
2.4 Box Plots
- Box Plots: Offer a graphic view of data concentration and extremes; display minimum, Q1, median, Q3, and maximum.
- Interpretation: The box represents the interquartile range (Q1 to Q3).
2.5 Measures of Center of Data
- Mean and Median:
- Mean Calculation: Sum of values divided by number of values.
- Median Calculation: Middle value when ordered.
- Example: Dataset interpretation through calculation of mean and median.
2.6 Skewness and the Mean, Median, and Mode
- Skewness: Affects the relationship between mean, median, and mode.
- Right Skewed Data: Mean > median > mode.
- Left Skewed Data: Mean < median < mode.
- Examples illustrate these patterns in various datasets.
2.7 Measures of Spread of Data
- Standard Deviation and Variance: Measures of how data values deviate from the mean.
- Understanding Variation: Calculating deviations and their squares to compute variance.
- Sample vs Population: Uses different formulas; sample variance divides by n-1 for better estimation of population.
Conclusion
- Understand the interplay between descriptive methods of statistics, measures of location, spread, and the visual representation of data to draw insight effectively from datasets.
- For each graphical representation, focus on interpreting context and relevance in statistical analysis to support decision-making and conclusions.