StatisticsHandbook
Copyright Information
Copyright: 2014, Earl Whitney, Reno NV. All Rights Reserved
Prepared By: Earl L. Whitney, FSA, MAAA
Version: 1.0
Date: April 27, 2014
Table of Contents
Page 6: AP Statistics Formula Sheet
Page 8: Part 1: Exploring Data
Variable Types
5-Number Summary
Other Terms to Know
Frequency Distribution
Cumulative Frequency Distribution
Center, Shape and Spread
Types of Plots
Marginal Distributions
Normal Distribution
Page 15: Part 2: Exploring Bivariate Data
Definitions
Formulas Relating to the Coefficient of Correlation of a Sample
Linear Combinations of Parameters and Statistics
Types of Regression Models
Pages 20-30: Various Chapters covering Data Re-expression, Sample Surveys, and Experimental Studies
Page 34: Part 4: Probability
Key Definitions
Geometric Probability Model
Binomial Probability Model
Normal Probability Model
Key Formulas
Page 38-44: Chapters covering Sampling Distributions, Confidence Intervals, Hypothesis Testing, and Comparisons
Page 47-60: Chapters covering Inferences about Means, Paired Data, and Goodness-of-Fit Test
Page 61-68: Testing Homogeneity, Independence, and Regression Analysis
Key Sections Overview
Part 1: Exploring Data
Variable Types
Quantitative Variables: Numerical values (e.g., age, GPA)
Categorical Variables: Non-numerical categories (e.g., hair color, political affiliation)
5-Number Summary
Minimum: Lowest value
Q1: Median of lower half
Median: Middle value of dataset
Q3: Median of upper half
Maximum: Highest value
Interquartile Range (IQR): IQR = Q3 - Q1
Range: Range = Maximum - Minimum
Outliers: Values exceeding 1.5 IQRs from Q1 or Q3
Other Terms
Mean: Arithmetic average
Mode: Most frequently occurring value
Cluster: Subgroup of closely related data
Gap: Break in data with no values available
Part 2: Exploring Bivariate Data
Definitions
Association: Relationship between two variables
Correlation: Measure of linear relationship strength
Lurking Variable: Causes correlation without direct relationship
Confounding: Uncertainty about the causal relationship
Key Statistical Models and Conditions
Key Formulas and Conditions for Regression Analysis
Linear Regression: Fit a line to minimize squared errors
Residuals: Differences between observed and predicted values, should scatter around zero
Extrapolation: Extend predictions beyond observed data; risky
Inferences in Hypothesis Testing
Steps
Hypotheses: Establish null (H0) and alternative (H1)
Construction of Sampling Distribution: Require conditions like independence and normality
Calculation of p-values: Assess statistical significance
Conclusion: Decide on rejecting H0 based on p-value comparison to α
Type I and Type II Errors
Type I Error: Incorrectly reject a true null hypothesis
Type II Error: Fail to reject a false null hypothesis
Summary of Formulas
Sampling Distribution Models: Normal approximation based on sample size and success/failure conditions
Confidence Intervals: Formulas depend on known parameters and standard errors
Comparing Two Proportions: Formulas and conditions for valid tests
Regression Analysis Considerations
Assumptions: Examine linearity, independence, normality, and equal variance conditions
Residual Analysis: Check for randomness and normal distribution of residuals to validate the model