Exploring Data with Tables and Graphs

Frequency Distributions for Organizing and Summarizing Data

Frequency Distribution: A method to organize data into categories, showing the number of occurrences in each category.
- Frequency Table: Lists categories and the frequency of data values in each.

Example: Commute Time in Los Angeles

Commute times example data provided.
- Frequency Distribution:
- Classes: 0-14, 15-29, 30-44, 45-59, 60-74, 75-89, 90-104
- Frequencies:
  - 0-14: 6
  - 15-29: 18
  - 30-44: 14
  - 45-59: 5
  - 60-74: 5
  - 75-89: 1
  - 90-104: 1
Steps to Construct Frequency Distribution:
1. Select classes: Desired number of classes (7).
2. Calculate class width: (Max - Min) / Number of classes
  - In this case: (90 - 5) / 7 ≈ 12.1 rounded to 15.
3. Determine class limits: Starting at 0, adding class width of 15.
4. Identify frequencies for each class from the data.

Relative Frequency Distributions

Relative Frequency Distribution: Each frequency is expressed as a relative frequency or percentage relative to total occurrences.
Calculation:
- Relative frequency for a class = Frequency for the class / Total Frequency
- Example percentages for commuting times:
  - 0-14: 12%
  - 15-29: 36%
  - 30-44: 28%
  - 45-59: 10%
  - 60-74: 10%
  - 75-89: 2%
  - 90-104: 2%
The sum should be close to 100%.

Important Features of Histograms

Histogram: A bar graph representing frequencies.
- Each bar represents a class.
- Uses equal width bars.
- Y-axis represents frequencies.
Uses:
- Shows shape of data distribution.
- Identifies center and spread of data.
- Helps to visualize outliers.

Critical Thinking with Histograms

Analyze the histogram based on:
- Center of data: Location of median.
- Variation: Range of data spread.
- Shape: Bell-shaped, uniform, skewed, etc.
- Outliers: Values that stand out.

Understanding Skewness

Skewness: Distribution that is not symmetric.
- Right (Positively) Skewed: Longer tail on right.
- Left (Negatively) Skewed: Longer tail on left.

Quantitative Data and Normal Distribution

Normal Distribution: Data follows a bell-shaped curve.
Checking for normality:
- Histogram shape.
- Normal quantile plots can show alignment with an expected normal distribution pattern.

Measures of Center

Mean: Average value of data set = (Sum of data values) / (Number of values).
Median: Middle value in ordered data. If odd count, take middle number; if even, average the two middle numbers.
Mode: Most frequent value(s). Can have no mode, one mode, or multiple modes.

Measures of Variation

Range: Difference between max and min values. Sensitive to outliers.
Standard Deviation (SD): Measures data dispersion from the mean. SD formula considers each data point's deviation from the mean.
Variance: The square of the standard deviation.

Percentiles and Quartiles

Percentiles: Values that divide data into 100 groups.
Quartiles: Divides data into four equal parts, denoted as Q1, Q2 (median), Q3.
- IQR (Interquartile Range): Q3 - Q1, measures data spread between the middle 50% of values.

Boxplots (Box-and-whisker Plots)

A graphical representation of the 5-number summary (minimum, Q1, median, Q3, maximum).
Useful for visualizing the spread and identifying outliers.

Outliers

Data points that stand out from the rest, can distort mean and SD; significant investigation needed to determine if they should be removed or described.