Data Summarization and Descriptive Statistics

Data Summarization: Illustrative Examples

  • Summarizing data using illustrative examples.

Variables and Measurement Scales

  • Variables and measurement scales are fundamental in statistics.

What is Data?

  • Data are facts or figures, numerical or otherwise, collected with a defined purpose.

Types of Data

  • Qualitative Data:

    • Nominal Data:

      • Examples include gender (women, men), hair color (blonde, brown), and ethnicity (Hispanic, Asian).

    • Ordinal Data:

      • Examples include first, second, and third positions, letter grades (A, B, C), and economic status (low, medium).

  • Quantitative Data:

    • Discrete Data:

      • Examples include the number of students in a class, the number of workers in a company, and the number of home runs in a baseball game.

    • Continuous Data:

      • Examples include the height of children, the square footage of a two-bedroom house, and the speed of cars.

Data and Variables

  • Process:

    • Expectation → Transcribe → Organize → Check → Validate → Summarize → Reality

  • Variables include:

    • Object

    • Pollutants

    • Organisms

    • Populations

    • Temperature

    • Rainfall/Precipitation

    • Relative Humidity (RH)

    • Color

    • Sweetness

    • Weight

What We Measure

  • Examples:

    • Observational Unit: Butterfly

      • Variable: Cases

      • Data: ΣΣΣΣ (6 cases)

    • Observational Unit: Quadrat

      • Variable: Cases

    • Observational Unit: Nest

      • Variable: Number of colonies

      • Data: 12 cases with varying numbers (e.g., 4, 10, 6, 8, 4, 14, 3, 9, 11, 10).

Getting Information from Our Data: Descriptive Statistics

  • Descriptive statistics are methods used to describe, show, and summarize data in a meaningful way.

The Grouped Frequency Table

  • Variable: Height of hedgehogs in cm

    • Example data points: 25.40, 28.99, 24.64, 24.86, 27.86, …, 18.73, 21.55

    • Total: 196 cases

The Grouped Frequency Table (Continued)

  • Minimum value: 0

  • Maximum value lies somewhere between 35 and 40

  • Most values are within 10 and 23 cm.

The Grouped Frequency Table: Interval and Frequency

  • Inclusive: 2 hedgehogs with a height of 5.0 cm to 9.0 cm

  • Exclusive: 1 hedgehog with a height of 5.00 cm to 8.9 cm

  • Categorical vs. Continuous variables

The Grouped Frequency Table: Class Interval and Frequency

  • Class Interval

    • 5-9 cm: 2

    • 9-13 cm: 6

    • 13-17 cm: 15

Size, Class, Frequency, and Relative Frequency

  • Size Class, Frequency, and Relative Frequency

    • 5-9: Frequency 8, Relative Frequency 0.041 (4.1%)

    • 9-13: Frequency 25, Relative Frequency 0.127 (12.7%)

    • 13-17: Frequency 37, Relative Frequency 0.188 (18.8%)

    • 17-21: Frequency 52, Relative Frequency 0.265 (26.5%)

    • 21-25: Frequency 38, Relative Frequency 0.194 (19.4%)

    • 25-29: Frequency 28, Relative Frequency 0.143 (14.3%)

    • 29-33: Frequency 8, Relative Frequency 0.041 (4.1%)

  • The area of the bins represents the percentage or proportion of values within the limits of the bins’ width.

The Histogram

  • Most hedgehogs are between 15 and 25 cm in height.

  • The traps catch hedgehogs as small as 5 cm.

  • Measures of Location and Spread are essential.

Measures of Location

  • Provide information about where the center of the set of data is and indirectly about its shape.

  • 1. The Mean:

    • Standardized sum

    • Example calculation: \frac{25.4 + 28.9 + 24.64 + 24.86 + 27.86}{5} = \frac{131.75}{5} = 26.35

    • Mean value represents a typical height for a hedgehog.

Measures of Location: Median

  • 2. The Median:

    • The value at the middle; 50% of the values are below and 50% are above.

    • Example: 24.64 | 24.86 | 25.4 | 27.86 | 28.9. The median is 25.4.

Measures of Location: Mode

  • 3. The Mode:

    • The most frequent value.

    • Example: 24.64 | 24.86 | 25.4 | 27.86 | 24.64. The mode is 24.64.

Measures of Spread

  • Provide information about the variability of our data.

  • 1. The Range:

    • Difference between the minimum and maximum values.

    • Example: 33.66 – 5.70 = 27.96

  • 2. Quartiles:

    • Q1 (25th Percentile)

    • Q2 (50th Percentile, Median)

    • Q3 (75th Percentile)

    • IQR (Inter-Quartile Range)

Measures of Spread: Variance and Standard Deviation

  • 3. The Variance and Standard Deviation:

    • Summarize how close each value is to the mean.

Descriptive Statistics

  • Statistics allow us to use a reduced number of values – or even one – to describe the features of a variable.

  • Location:

    • Mean: 19.18

    • Median: 18.98

    • Mode: 18.19

  • Spread:

    • Variance: 37.88

    • SD (Standard Deviation): 6.11

    • IQR (Inter-Quartile Range): 8.63

Descriptive Statistics: Reporting Measures

  • A measure of location should always be reported along with one measure of spread.

    • Example 1: Mean = 19.80, Standard Deviation = 6.11 (19.80 ± 6.11)

    • Example 2: Median = 16.05, IQR = 8.90 (16.05 ± 8.90)

  • Recommended size of hole calculations:

    • Based on mean: Upper limit = 19.8 + 6.11 ≈ 26 cm

    • Based on median = 25 cm

    • Recommended size of hole with median = 18 cm

Descriptive Statistics: Shape

  • Skewness: Asymmetry relative to the center of a distribution.

  • Kurtosis: Determines the “peak” of the distribution (how flat).

    • Right Skewed: Mode < Median < Mean

    • Symmetric: Mode = Median = Mean

    • Left Skewed: Mean < Median < Mode

  • Skewness focuses on the spread (tails) of normal distribution, Kurtosis focuses more on the height of the distribution curve.

Summary

  • Statistics is concerned with collecting, describing, and using data.

  • Descriptive statistics encompass a set of procedures to summarize data from a variable of interest.

  • The method to summarize and describe data depends on the scale of measurement of that variable of interest.

  • The measures of location provide information about the values of the majority of the cases (what is a typical value?).

  • The measures of spread provide information about the variability or how much our values differ from each other and from the mean (how typical is typical?).

  • We always report a measure of location accompanied by the appropriate measure of spread or variability.