Understanding Distributions and Graphs
Chapter Overview
Focus on Chapter Two material, which encompasses:
Distributions
Graphs
Analyzing and interpreting data through visualization
Data Collection and Population
Introduction of Sample and Population concept:
Example used: Toasters
Population: All toasters
Sample: Specific toasters found online (e.g., through Google Shopping)
Variable of interest: Price of toasters
Example toaster prices collected:
$69.99
$299.99 (not a joke, specifically a KitchenAid toaster)
$36.46
$39.99
$299.95
Data Organization
Importance of organizing data:
Presentation of raw data can be confusing without organization.
Actual question: What is the average price for toasters in 2020?
Frequency Distribution
Definition: A way to describe the structure of a dataset by breaking values into categories or classes and counting occurrences within each class.
Frequency: Number of occurrences within each class.
Examples of Classes for Toaster Prices:
$0 - $49.99
$50 - $99.99
Counting frequency:
Example findings:
Nine toasters priced between $0 and $49.99
Five toasters priced between $50 and $99.99
Two toasters priced between $250 and $299.99
Conclusion drawn: Majority of toasters priced below $50.
Vocabulary Related to Classes
Lower Class Limit: The smallest number in a class.
Example: For the class $50 - $99.99, the lower class limit is $50.
Upper Class Limit: The largest number in a class.
For the class $50 - $99.99, the upper limit is $99.99.
Class Width: The range of values within a class.
Calculated as the difference between two consecutive lower class limits (e.g., $50 - $0 = $50).
Class Boundary: The midpoint between two classes.
Example: Class boundary between $49.99 and $50 is $49.995.
Midpoint: Average of the lower and upper class limits.
Example for class $0 - $49.99:
ext{Midpoint} = rac{0 + 49.99}{2} = 24.995
Graphical Representations
Introduction to Graphs: Valuable for understanding data sets visually, making them easier to interpret than raw data or frequency distributions.
Histogram
Definition: A graph displaying the frequency of classes using bar heights.
X-axis: Represents class boundaries (e.g., prices for toasters).
Y-axis: Represents frequency.
Characteristics:
Bars in a histogram touch because they represent continuous data ranges.
Example interpretation of a histogram for toaster prices.
Other Types of Graphic Displays
Pie Chart: Represents qualitative data.
Size of pie slices correlates with the percentage of the total.
Warning on misrepresentation and visual misleading:
Example with pumpkin (36%) vs. strawberry (2%).
Bar Graph: Similar to histograms, but for categorical data (not necessarily continuous).
Height of bars indicates frequencies or amounts of different categories (e.g., candy bar chocolate content).
Important to ensure correct representation of values (e.g., weights, dimensions).
Stem-and-Leaf Plot
Definition: Displays individual data points while maintaining the frequency structure.
Each stem represents the leading digit(s), while each leaf represents the trailing digit (e.g., stem 5, leaf 6 = 56).
Advantage: Retains all individual data values unique to the dataset.
Distribution Shape
Three types of distribution shapes:
Symmetric Distribution: Data is evenly distributed around the mean.
Skewed Left (Left-Skewed): Data is concentrated on the right; has a longer tail on the left.
Skewed Right (Right-Skewed): Data is concentrated on the left; has a longer tail on the right (e.g., toaster prices).
Example: Annual household income in the U.S.
Heavy concentration of households earning less than $90,000, with a significant tail towards higher incomes.
Importance of understanding skewness for interpretation of data sets.