SE Unit 0 Topic 2 Statistics
Introduction to Statistics in AP Biology
Statistics is crucial for analyzing data in scientific studies.
Scientists typically collect data from a sample of a population to infer about the general population.
Graphing Data and Distribution
The first step in data analysis involves graphing the data and examining its distribution.
Typical data often exhibit a normal distribution, represented as a bell-shaped curve.
Central Tendencies
Measures of Central Tendencies
Descriptive statistics allow researchers to describe and quantify differences between data sets.
The center of a distribution can be summarized using three measures: mean, median, and mode.
Mean
The mean is defined as the average of a data set.
To calculate the mean, sum all data points and divide by the total number of points.
Formula: Mean = (Sum of all data points) / (Total number of data points)
Example: Mean Calculation
In a biology class, students planted five tomato seeds and measured their heights in mm: 65, 52, 71, 56, 61.
To find the mean, calculate:
Mean = (65 + 52 + 71 + 56 + 61)/5 = 61.
Median
The median is the middle number in a sorted list of data points.
To find the median, arrange the data in order and identify the middle value.
If the number of data points is even, average the two middle numbers.
The median is helpful in data sets with extreme values since it isn’t skewed by extreme measurements.
Example: Median Calculation
For nine labrador retrievers, the times are measured as: 4, 5, 2, 1, 4, 8, 4, 7, 1.
Arranging: 1, 1, 2, 4, 4, 4, 4, 5, 7, 8.
The median is 4.
Mode
The mode is the value that occurs most frequently in a data set.
It is less common to use mode for central tendency but is useful in certain distributions, such as bimodal distributions.
Example: Mode Calculation
For the dataset of 10 high school students’ TikTok usage: 10, 5, 5, 8, 5, 2, 5, 4, 4, 3.
The mode is 5 since it appears most frequently.
Variability
Measure of Variability
Variability indicates how spread out a data set is from the central tendency, measured by range and standard deviation.
Range
Range is calculated as the difference between the largest and smallest values in a dataset.
A larger range signifies greater variability, while a smaller range indicates less variability.
Example: Range Calculation
For tomato plants with heights: 65, 52, 71, 56, 61.
Range = 71 - 52 = 19.
Standard Deviation
Standard deviation measures how data points deviate from the mean.
A low standard deviation indicates data points are close to the mean, while a high standard deviation implies a wide spread.
Standard deviation calculations involve determining the mean, calculating deviations from the mean, squaring those deviations, and then averaging them.
Example: Standard Deviation
From the tomato plant heights, calculate the mean and deviations.
After following the standard deviation steps, the resulting standard deviation is calculated as 7.45.
Interpretation of Standard Deviation
1 standard deviation encompasses 68% of the data around the mean.
2 standard deviations include 95%, and 3 standard deviations include 99% of the data.
Standard Error of the Mean (SEM)
Standard error of the mean (SEM) provides an estimate of how well the sample mean represents the population mean.
Lower SEM indicates higher confidence in the mean estimate.
Formula: SEM = Standard Deviation / Square Root of Sample Size.
Practice with SEM
Using the tomato plant data with standard deviation of 7.45 and a sample size of 5 yields:
SEM = 7.45 / √5 = 3.3.
Graphing SEM
SEM is commonly represented with error bars in graphs, indicating the range of variability in the mean estimate.
If error bars overlap, the observed difference may not be statistically significant.
Conclusion
Analyzing measures of central tendency and variability are crucial for interpreting data effectively in AP Biology.