StatisticsHandbook

Copyright Information

  • Copyright: 2014, Earl Whitney, Reno NV. All Rights Reserved

  • Prepared By: Earl L. Whitney, FSA, MAAA

  • Version: 1.0

  • Date: April 27, 2014

Table of Contents

  • Page 6: AP Statistics Formula Sheet

  • Page 8: Part 1: Exploring Data

    • Variable Types

    • 5-Number Summary

    • Other Terms to Know

    • Frequency Distribution

    • Cumulative Frequency Distribution

    • Center, Shape and Spread

    • Types of Plots

    • Marginal Distributions

    • Normal Distribution

  • Page 15: Part 2: Exploring Bivariate Data

    • Definitions

    • Formulas Relating to the Coefficient of Correlation of a Sample

    • Linear Combinations of Parameters and Statistics

    • Types of Regression Models

  • Pages 20-30: Various Chapters covering Data Re-expression, Sample Surveys, and Experimental Studies

  • Page 34: Part 4: Probability

    • Key Definitions

    • Geometric Probability Model

    • Binomial Probability Model

    • Normal Probability Model

    • Key Formulas

  • Page 38-44: Chapters covering Sampling Distributions, Confidence Intervals, Hypothesis Testing, and Comparisons

  • Page 47-60: Chapters covering Inferences about Means, Paired Data, and Goodness-of-Fit Test

  • Page 61-68: Testing Homogeneity, Independence, and Regression Analysis

Key Sections Overview

Part 1: Exploring Data

Variable Types

  • Quantitative Variables: Numerical values (e.g., age, GPA)

  • Categorical Variables: Non-numerical categories (e.g., hair color, political affiliation)

5-Number Summary

  • Minimum: Lowest value

  • Q1: Median of lower half

  • Median: Middle value of dataset

  • Q3: Median of upper half

  • Maximum: Highest value

  • Interquartile Range (IQR): IQR = Q3 - Q1

  • Range: Range = Maximum - Minimum

  • Outliers: Values exceeding 1.5 IQRs from Q1 or Q3

Other Terms

  • Mean: Arithmetic average

  • Mode: Most frequently occurring value

  • Cluster: Subgroup of closely related data

  • Gap: Break in data with no values available

Part 2: Exploring Bivariate Data

Definitions

  • Association: Relationship between two variables

  • Correlation: Measure of linear relationship strength

  • Lurking Variable: Causes correlation without direct relationship

  • Confounding: Uncertainty about the causal relationship

Key Statistical Models and Conditions

Key Formulas and Conditions for Regression Analysis

  • Linear Regression: Fit a line to minimize squared errors

  • Residuals: Differences between observed and predicted values, should scatter around zero

  • Extrapolation: Extend predictions beyond observed data; risky

Inferences in Hypothesis Testing

Steps

  1. Hypotheses: Establish null (H0) and alternative (H1)

  2. Construction of Sampling Distribution: Require conditions like independence and normality

  3. Calculation of p-values: Assess statistical significance

  4. Conclusion: Decide on rejecting H0 based on p-value comparison to α

Type I and Type II Errors

  • Type I Error: Incorrectly reject a true null hypothesis

  • Type II Error: Fail to reject a false null hypothesis

Summary of Formulas

  • Sampling Distribution Models: Normal approximation based on sample size and success/failure conditions

  • Confidence Intervals: Formulas depend on known parameters and standard errors

  • Comparing Two Proportions: Formulas and conditions for valid tests

Regression Analysis Considerations

  • Assumptions: Examine linearity, independence, normality, and equal variance conditions

  • Residual Analysis: Check for randomness and normal distribution of residuals to validate the model