Exploring Data with Tables and Graphs

Frequency Distributions for Organizing and Summarizing Data
  • Frequency Distribution: A method to organize data into categories, showing the number of occurrences in each category.

    • Frequency Table: Lists categories and the frequency of data values in each.

Example: Commute Time in Los Angeles
  • Commute times example data provided.

    • Frequency Distribution:

    • Classes: 0-14, 15-29, 30-44, 45-59, 60-74, 75-89, 90-104

    • Frequencies:

      • 0-14: 6

      • 15-29: 18

      • 30-44: 14

      • 45-59: 5

      • 60-74: 5

      • 75-89: 1

      • 90-104: 1

  • Steps to Construct Frequency Distribution:

    1. Select classes: Desired number of classes (7).

    2. Calculate class width: (Max - Min) / Number of classes

      • In this case: (90 - 5) / 7 ≈ 12.1 rounded to 15.

    3. Determine class limits: Starting at 0, adding class width of 15.

    4. Identify frequencies for each class from the data.

Relative Frequency Distributions
  • Relative Frequency Distribution: Each frequency is expressed as a relative frequency or percentage relative to total occurrences.

  • Calculation:

    • Relative frequency for a class = Frequency for the class / Total Frequency

    • Example percentages for commuting times:

      • 0-14: 12%

      • 15-29: 36%

      • 30-44: 28%

      • 45-59: 10%

      • 60-74: 10%

      • 75-89: 2%

      • 90-104: 2%

  • The sum should be close to 100%.

Important Features of Histograms
  • Histogram: A bar graph representing frequencies.

    • Each bar represents a class.

    • Uses equal width bars.

    • Y-axis represents frequencies.

  • Uses:

    • Shows shape of data distribution.

    • Identifies center and spread of data.

    • Helps to visualize outliers.

Critical Thinking with Histograms
  • Analyze the histogram based on:

    • Center of data: Location of median.

    • Variation: Range of data spread.

    • Shape: Bell-shaped, uniform, skewed, etc.

    • Outliers: Values that stand out.

Understanding Skewness
  • Skewness: Distribution that is not symmetric.

    • Right (Positively) Skewed: Longer tail on right.

    • Left (Negatively) Skewed: Longer tail on left.

Quantitative Data and Normal Distribution
  • Normal Distribution: Data follows a bell-shaped curve.

  • Checking for normality:

    • Histogram shape.

    • Normal quantile plots can show alignment with an expected normal distribution pattern.

Measures of Center
  • Mean: Average value of data set = (Sum of data values) / (Number of values).

  • Median: Middle value in ordered data. If odd count, take middle number; if even, average the two middle numbers.

  • Mode: Most frequent value(s). Can have no mode, one mode, or multiple modes.

Measures of Variation
  • Range: Difference between max and min values. Sensitive to outliers.

  • Standard Deviation (SD): Measures data dispersion from the mean. SD formula considers each data point's deviation from the mean.

  • Variance: The square of the standard deviation.

Percentiles and Quartiles
  • Percentiles: Values that divide data into 100 groups.

  • Quartiles: Divides data into four equal parts, denoted as Q1, Q2 (median), Q3.

    • IQR (Interquartile Range): Q3 - Q1, measures data spread between the middle 50% of values.

Boxplots (Box-and-whisker Plots)
  • A graphical representation of the 5-number summary (minimum, Q1, median, Q3, maximum).

  • Useful for visualizing the spread and identifying outliers.

Outliers
  • Data points that stand out from the rest, can distort mean and SD; significant investigation needed to determine if they should be removed or described.