Data Summarization and Descriptive Statistics
Data Summarization: Illustrative Examples
Summarizing data using illustrative examples.
Variables and Measurement Scales
Variables and measurement scales are fundamental in statistics.
What is Data?
Data are facts or figures, numerical or otherwise, collected with a defined purpose.
Types of Data
Qualitative Data:
Nominal Data:
Examples include gender (women, men), hair color (blonde, brown), and ethnicity (Hispanic, Asian).
Ordinal Data:
Examples include first, second, and third positions, letter grades (A, B, C), and economic status (low, medium).
Quantitative Data:
Discrete Data:
Examples include the number of students in a class, the number of workers in a company, and the number of home runs in a baseball game.
Continuous Data:
Examples include the height of children, the square footage of a two-bedroom house, and the speed of cars.
Data and Variables
Process:
Expectation → Transcribe → Organize → Check → Validate → Summarize → Reality
Variables include:
Object
Pollutants
Organisms
Populations
Temperature
Rainfall/Precipitation
Relative Humidity (RH)
Color
Sweetness
Weight
What We Measure
Examples:
Observational Unit: Butterfly
Variable: Cases
Data: ΣΣΣΣ (6 cases)
Observational Unit: Quadrat
Variable: Cases
Observational Unit: Nest
Variable: Number of colonies
Data: 12 cases with varying numbers (e.g., 4, 10, 6, 8, 4, 14, 3, 9, 11, 10).
Getting Information from Our Data: Descriptive Statistics
Descriptive statistics are methods used to describe, show, and summarize data in a meaningful way.
The Grouped Frequency Table
Variable: Height of hedgehogs in cm
Example data points: 25.40, 28.99, 24.64, 24.86, 27.86, …, 18.73, 21.55
Total: 196 cases
The Grouped Frequency Table (Continued)
Minimum value: 0
Maximum value lies somewhere between 35 and 40
Most values are within 10 and 23 cm.
The Grouped Frequency Table: Interval and Frequency
Inclusive: 2 hedgehogs with a height of 5.0 cm to 9.0 cm
Exclusive: 1 hedgehog with a height of 5.00 cm to 8.9 cm
Categorical vs. Continuous variables
The Grouped Frequency Table: Class Interval and Frequency
Class Interval
5-9 cm: 2
9-13 cm: 6
13-17 cm: 15
Size, Class, Frequency, and Relative Frequency
Size Class, Frequency, and Relative Frequency
5-9: Frequency 8, Relative Frequency 0.041 (4.1%)
9-13: Frequency 25, Relative Frequency 0.127 (12.7%)
13-17: Frequency 37, Relative Frequency 0.188 (18.8%)
17-21: Frequency 52, Relative Frequency 0.265 (26.5%)
21-25: Frequency 38, Relative Frequency 0.194 (19.4%)
25-29: Frequency 28, Relative Frequency 0.143 (14.3%)
29-33: Frequency 8, Relative Frequency 0.041 (4.1%)
The area of the bins represents the percentage or proportion of values within the limits of the bins’ width.
The Histogram
Most hedgehogs are between 15 and 25 cm in height.
The traps catch hedgehogs as small as 5 cm.
Measures of Location and Spread are essential.
Measures of Location
Provide information about where the center of the set of data is and indirectly about its shape.
1. The Mean:
Standardized sum
Example calculation: \frac{25.4 + 28.9 + 24.64 + 24.86 + 27.86}{5} = \frac{131.75}{5} = 26.35
Mean value represents a typical height for a hedgehog.
Measures of Location: Median
2. The Median:
The value at the middle; 50% of the values are below and 50% are above.
Example: 24.64 | 24.86 | 25.4 | 27.86 | 28.9. The median is 25.4.
Measures of Location: Mode
3. The Mode:
The most frequent value.
Example: 24.64 | 24.86 | 25.4 | 27.86 | 24.64. The mode is 24.64.
Measures of Spread
Provide information about the variability of our data.
1. The Range:
Difference between the minimum and maximum values.
Example: 33.66 – 5.70 = 27.96
2. Quartiles:
Q1 (25th Percentile)
Q2 (50th Percentile, Median)
Q3 (75th Percentile)
IQR (Inter-Quartile Range)
Measures of Spread: Variance and Standard Deviation
3. The Variance and Standard Deviation:
Summarize how close each value is to the mean.
Descriptive Statistics
Statistics allow us to use a reduced number of values – or even one – to describe the features of a variable.
Location:
Mean: 19.18
Median: 18.98
Mode: 18.19
Spread:
Variance: 37.88
SD (Standard Deviation): 6.11
IQR (Inter-Quartile Range): 8.63
Descriptive Statistics: Reporting Measures
A measure of location should always be reported along with one measure of spread.
Example 1: Mean = 19.80, Standard Deviation = 6.11 (19.80 ± 6.11)
Example 2: Median = 16.05, IQR = 8.90 (16.05 ± 8.90)
Recommended size of hole calculations:
Based on mean: Upper limit = 19.8 + 6.11 ≈ 26 cm
Based on median = 25 cm
Recommended size of hole with median = 18 cm
Descriptive Statistics: Shape
Skewness: Asymmetry relative to the center of a distribution.
Kurtosis: Determines the “peak” of the distribution (how flat).
Right Skewed: Mode < Median < Mean
Symmetric: Mode = Median = Mean
Left Skewed: Mean < Median < Mode
Skewness focuses on the spread (tails) of normal distribution, Kurtosis focuses more on the height of the distribution curve.
Summary
Statistics is concerned with collecting, describing, and using data.
Descriptive statistics encompass a set of procedures to summarize data from a variable of interest.
The method to summarize and describe data depends on the scale of measurement of that variable of interest.
The measures of location provide information about the values of the majority of the cases (what is a typical value?).
The measures of spread provide information about the variability or how much our values differ from each other and from the mean (how typical is typical?).
We always report a measure of location accompanied by the appropriate measure of spread or variability.