Exploring Data with Tables and Graphs
Frequency Distributions for Organizing and Summarizing Data
Frequency Distribution: A method to organize data into categories, showing the number of occurrences in each category.
Frequency Table: Lists categories and the frequency of data values in each.
Example: Commute Time in Los Angeles
Commute times example data provided.
Frequency Distribution:
Classes: 0-14, 15-29, 30-44, 45-59, 60-74, 75-89, 90-104
Frequencies:
0-14: 6
15-29: 18
30-44: 14
45-59: 5
60-74: 5
75-89: 1
90-104: 1
Steps to Construct Frequency Distribution:
Select classes: Desired number of classes (7).
Calculate class width:
(Max - Min) / Number of classesIn this case:
(90 - 5) / 7 ≈ 12.1rounded to 15.
Determine class limits: Starting at 0, adding class width of 15.
Identify frequencies for each class from the data.
Relative Frequency Distributions
Relative Frequency Distribution: Each frequency is expressed as a relative frequency or percentage relative to total occurrences.
Calculation:
Relative frequency for a class = Frequency for the class / Total Frequency
Example percentages for commuting times:
0-14: 12%
15-29: 36%
30-44: 28%
45-59: 10%
60-74: 10%
75-89: 2%
90-104: 2%
The sum should be close to 100%.
Important Features of Histograms
Histogram: A bar graph representing frequencies.
Each bar represents a class.
Uses equal width bars.
Y-axis represents frequencies.
Uses:
Shows shape of data distribution.
Identifies center and spread of data.
Helps to visualize outliers.
Critical Thinking with Histograms
Analyze the histogram based on:
Center of data: Location of median.
Variation: Range of data spread.
Shape: Bell-shaped, uniform, skewed, etc.
Outliers: Values that stand out.
Understanding Skewness
Skewness: Distribution that is not symmetric.
Right (Positively) Skewed: Longer tail on right.
Left (Negatively) Skewed: Longer tail on left.
Quantitative Data and Normal Distribution
Normal Distribution: Data follows a bell-shaped curve.
Checking for normality:
Histogram shape.
Normal quantile plots can show alignment with an expected normal distribution pattern.
Measures of Center
Mean: Average value of data set = (Sum of data values) / (Number of values).
Median: Middle value in ordered data. If odd count, take middle number; if even, average the two middle numbers.
Mode: Most frequent value(s). Can have no mode, one mode, or multiple modes.
Measures of Variation
Range: Difference between max and min values. Sensitive to outliers.
Standard Deviation (SD): Measures data dispersion from the mean. SD formula considers each data point's deviation from the mean.
Variance: The square of the standard deviation.
Percentiles and Quartiles
Percentiles: Values that divide data into 100 groups.
Quartiles: Divides data into four equal parts, denoted as Q1, Q2 (median), Q3.
IQR (Interquartile Range): Q3 - Q1, measures data spread between the middle 50% of values.
Boxplots (Box-and-whisker Plots)
A graphical representation of the 5-number summary (minimum, Q1, median, Q3, maximum).
Useful for visualizing the spread and identifying outliers.
Outliers
Data points that stand out from the rest, can distort mean and SD; significant investigation needed to determine if they should be removed or described.