STAT Ch 2.1-2.3

Definition: A histogram is a graphical representation of data that groups data into intervals (bins).
- Properties:
- Each rectangle represents a bin and its height corresponds to the count of data values in that bin.
- First value in each bin is represented on the horizontal axis.
- Consecutive bins touch each other.
Example: The vertical axis of a frequency histogram can show either frequency or relative frequency.
Bin Width: Changing the bin width affects the shape of the histogram.
- Smaller bins lead to a spikier histogram.

Also known as stem-and-leaf plots.
Use Cases: Useful for small datasets and when technology is unavailable.
Construction:
- Each observation is divided into a "stem" (all but the last digit) and a "leaf" (the last digit).
Example of Stemplot:
- For the data set: 1, 1, 1…
- Stemplot would show values split by stems and leaves.

Important Features:
- Shape: Visual appearance (symmetric, skewed, etc.).
- Center: Typical value (mean or median).
- Spread: Variability or range of data.
- Outliers: Data points that differ significantly from others.

Characteristics to consider:
- Is the distribution symmetric or skewed?
- How many mounds are present? (Unimodal, Bimodal, Multimodal)
- Are there unusual values (outliers)?
Symmetric Distribution: Roughly equal on both sides.
Skewed Distribution: Most data on one sidewith a tail on the other (right or left skewed).
Uniform Distribution: All bars in a histogram have similar height, indicating equal frequency.

For Symmetric Distributions:
- Measure for Center: Mean (balancing point).
- Measure for Spread: Standard Deviation.
For Skewed Distributions:
- Measure for Center: Median (middle value).
- Measure for Spread: Interquartile Range (IQR).

Measures how far each data point is from the mean.
More than 68% of data falls within one standard deviation in a normal distribution.
Standard Deviation Formula:
1. Find the deviation (distance) of each observation from the mean: x - \bar{x} .
2. Square each deviation.
3. Sum the squared deviations.
4. Divide by (n-1) (sample) or N (population).
5. Take the square root of the result.

Measures the middle 50% of variability in data.
Calculation:
- Find the first (Q1) and third quartiles (Q3).
- Calculate IQR: IQR = Q3 - Q1