STAT Ch 2.1-2.3
Descriptive Statistics
Histograms
Definition: A histogram is a graphical representation of data that groups data into intervals (bins).
Properties:
Each rectangle represents a bin and its height corresponds to the count of data values in that bin.
First value in each bin is represented on the horizontal axis.
Consecutive bins touch each other.
Example: The vertical axis of a frequency histogram can show either frequency or relative frequency.
Bin Width: Changing the bin width affects the shape of the histogram.
Smaller bins lead to a spikier histogram.
Stemplots
Also known as stem-and-leaf plots.
Use Cases: Useful for small datasets and when technology is unavailable.
Construction:
Each observation is divided into a "stem" (all but the last digit) and a "leaf" (the last digit).
Example of Stemplot:
For the data set: 1, 1, 1…
Stemplot would show values split by stems and leaves.
Features of Numerical Distribution
Important Features:
Shape: Visual appearance (symmetric, skewed, etc.).
Center: Typical value (mean or median).
Spread: Variability or range of data.
Outliers: Data points that differ significantly from others.
Analyzing the Shape of Distribution
Characteristics to consider:
Is the distribution symmetric or skewed?
How many mounds are present? (Unimodal, Bimodal, Multimodal)
Are there unusual values (outliers)?
Symmetric Distribution: Roughly equal on both sides.
Skewed Distribution: Most data on one sidewith a tail on the other (right or left skewed).
Uniform Distribution: All bars in a histogram have similar height, indicating equal frequency.
Measures of Center and Spread
For Symmetric Distributions:
Measure for Center: Mean (balancing point).
Measure for Spread: Standard Deviation.
For Skewed Distributions:
Measure for Center: Median (middle value).
Measure for Spread: Interquartile Range (IQR).
Mean
Sample Mean: ar{x} = \frac{\Sigma x}{n}
Population Mean: \mu = \frac{\Sigma x}{N}
Interpretation of mean: Represents the average of the dataset.
Standard Deviation
Measures how far each data point is from the mean.
More than 68% of data falls within one standard deviation in a normal distribution.
Standard Deviation Formula:
Find the deviation (distance) of each observation from the mean: x - \bar{x} .
Square each deviation.
Sum the squared deviations.
Divide by (n-1) (sample) or N (population).
Take the square root of the result.
Interquartile Range (IQR)
Measures the middle 50% of variability in data.
Calculation:
Find the first (Q1) and third quartiles (Q3).
Calculate IQR: IQR = Q3 - Q1
Effect of Outliers
Outliers can significantly distort the mean.
The median is preferred as a measure of center when outliers are present.
Comparing Measures of Center
Symmetric Distributions: Use mean and standard deviation.
Skewed Distributions: Use median and IQR.