Understanding Distributions and Graphs

Focus on Chapter Two material, which encompasses:
- Distributions
- Graphs
- Analyzing and interpreting data through visualization

Introduction of Sample and Population concept:
- Example used: Toasters
- Population: All toasters
- Sample: Specific toasters found online (e.g., through Google Shopping)
Variable of interest: Price of toasters
- Example toaster prices collected:
- $69.99
- $299.99 (not a joke, specifically a KitchenAid toaster)
- $36.46
- $39.99
- $299.95

Importance of organizing data:
- Presentation of raw data can be confusing without organization.
- Actual question: What is the average price for toasters in 2020?

Definition: A way to describe the structure of a dataset by breaking values into categories or classes and counting occurrences within each class.
- Frequency: Number of occurrences within each class.
- Examples of Classes for Toaster Prices:
- $0 - $49.99
- $50 - $99.99
Counting frequency:
- Example findings:
- Nine toasters priced between $0 and $49.99
- Five toasters priced between $50 and $99.99
- Two toasters priced between $250 and $299.99
Conclusion drawn: Majority of toasters priced below $50.

Lower Class Limit: The smallest number in a class.
- Example: For the class $50 - $99.99, the lower class limit is $50.
Upper Class Limit: The largest number in a class.
- For the class $50 - $99.99, the upper limit is $99.99.
Class Width: The range of values within a class.
- Calculated as the difference between two consecutive lower class limits (e.g., $50 - $0 = $50).
Class Boundary: The midpoint between two classes.
- Example: Class boundary between $49.99 and $50 is $49.995.
Midpoint: Average of the lower and upper class limits.
- Example for class $0 - $49.99:
- ext{Midpoint} = rac{0 + 49.99}{2} = 24.995

Introduction to Graphs: Valuable for understanding data sets visually, making them easier to interpret than raw data or frequency distributions.

Definition: A graph displaying the frequency of classes using bar heights.
- X-axis: Represents class boundaries (e.g., prices for toasters).
- Y-axis: Represents frequency.
Characteristics:
- Bars in a histogram touch because they represent continuous data ranges.
- Example interpretation of a histogram for toaster prices.

Pie Chart: Represents qualitative data.
- Size of pie slices correlates with the percentage of the total.
- Warning on misrepresentation and visual misleading:
- Example with pumpkin (36%) vs. strawberry (2%).
Bar Graph: Similar to histograms, but for categorical data (not necessarily continuous).
- Height of bars indicates frequencies or amounts of different categories (e.g., candy bar chocolate content).
- Important to ensure correct representation of values (e.g., weights, dimensions).

Definition: Displays individual data points while maintaining the frequency structure.
- Each stem represents the leading digit(s), while each leaf represents the trailing digit (e.g., stem 5, leaf 6 = 56).
Advantage: Retains all individual data values unique to the dataset.

Three types of distribution shapes:
- Symmetric Distribution: Data is evenly distributed around the mean.
- Skewed Left (Left-Skewed): Data is concentrated on the right; has a longer tail on the left.
- Skewed Right (Right-Skewed): Data is concentrated on the left; has a longer tail on the right (e.g., toaster prices).
Example: Annual household income in the U.S.
- Heavy concentration of households earning less than $90,000, with a significant tail towards higher incomes.
Importance of understanding skewness for interpretation of data sets.