Last Minute AP Statistics Cheat Sheet (WITH FORMULAS)
Be Specific:
Always include numerical values and context.
Example: Instead of saying "a large number of students," specify "25 out of 100 students." This clarifies the scale and ensures a precise understanding.
Define and Show Work:
Clearly define variables and show all calculations step-by-step. This not only demonstrates your understanding but can also earn partial credit for correct methods, even if the final answer is wrong.
Example: If X is the number of successes in a binomial distribution, define it before starting your calculation.
Read Carefully:
Understand the question requirements before answering. Highlight key points or terms to focus your response.
Check Your Answers:
After solving, review your solution. Ensure it fits the problem context and check for calculation errors.
Center: Use the mean or median depending on the data's skewness.
Example: The median is better for skewed data, while the mean is appropriate for symmetrical data.
Unusual Features: Identify outliers or gaps and discuss their impact.
Example: An outlier might significantly affect the mean but have little effect on the median.
Spread: Discuss variability using range, interquartile range (IQR), or standard deviation.
Example: Standard deviation measures how spread out the data is from the mean.
Shape: Describe the distribution's shape:
Symmetrical: Mean ≈ Median.
Skewed Right: Tail on the right; mean > median.
Skewed Left: Tail on the left; mean < median.
Bimodal: Two peaks in the data.
Example Scenario:
For a histogram of test scores:
Center: Mean = 75.
Spread: Standard deviation = 10.
Shape: Skewed right, indicating more low scores.
Shape: Identify overall pattern and shape of the data distribution.
Example: Symmetrical, skewed left/right, or bimodal.
Outliers: Detect any points that fall far outside the general pattern.
Example: In a dataset with test scores, a score of 100 might be an outlier if most scores are between 50 and 70.
Center: Determine the typical value, using mean or median.
Example: Median is better when there are outliers.
Spread: Measure variability using range, IQR, or standard deviation.
Example: An IQR of 20 indicates that the middle 50% of data points are spread across 20 units.
Direction: Determine if the relationship is positive, negative, or neither.
Example: A positive direction means that as xxx increases, yyy increases.
Unusual Features: Look for outliers or clusters that do not fit the general pattern.
Example: A point far from the rest of the data in a scatterplot of height vs. weight.
Form: Identify the form of the relationship (linear or nonlinear).
Example: A linear form suggests a straight-line relationship, while a curved pattern indicates nonlinearity.
Strength: Assess how tightly the points follow the form.
Example: A strong correlation means points closely follow a line.
Strength: Strong, moderate, or weak.
Example: r=0.9 suggests a strong linear relationship.
Trend: Identify if the relationship is linear or nonlinear.
Example: A curved scatterplot indicates a nonlinear trend.
Direction: Positive or negative.
Example: A positive correlation means that as one variable increases, so does the other.
B: Binary-either success or failure
I: Independent trials
N: Number of trials fixed
S: Success probability stays the same
Properties: Fixed number of trials (n), binary outcomes (success/failure), independent trials, and constant probability of success (p).
Formula:
Where
is the number of ways to choose k successes from n trials.
Example: The probability of flipping exactly 3 heads in 5 coin tosses.
B: Binary-either success or failure
I: Independent trials
T: Trials until success
S: Success probability stays the same
Properties: Trials continue until the first success, with constant probability p.
Formula:
Example: Probability of rolling a six on the third dice roll.
P: Define the parameter of interest (mean μ\muμ or proportion ppp).
A: State assumptions:
Random Sample
Normality: For means, use CLT if n≥30; for proportions, check np≥10 and n(1−p)≥10.
N: Name the interval (e.g., 1-proportion z-interval).
I: Calculate the interval:
For proportions:
For means (unknown σ):
C: Conclude in context (e.g., "We are 95% confident that the true proportion is between...").
P: Define the parameter and significance level (α).
H: State hypotheses (H0 and Ha).
A: Check assumptions (randomness, normality, independence).
N: Identify the test (e.g., 1-proportion z-test).
T: Calculate the test statistic:
For proportions:
O: Find the p-value.
M: Make a decision (p<α: Reject H0).
S: State the conclusion in context.
Slope Interpretation: For each unit increase in x, y changes by the slope value.
Example: If the slope is 2.5, then for each additional hour studied, the predicted score increases by 2.5 points.
Coefficient of Determination (R2):
Indicates the percentage of variability in y explained by x.
Example: R2= 0.75 means 75% of the variation in test scores is explained by study hours.
Residual Plots:
No pattern suggests a good linear fit.
Curves or patterns suggest a nonlinear relationship or the need for transformation.
Simple Random Sample (SRS): Equal chance for all samples.
Stratified (Random): Divide into strata, then sample from each.
Cluster: Randomly select clusters and sample all within them.
Systematic: Sample every kth individual after a random start.
Voluntary: Sample is selected in a way that people do not have to respond
Convenience: Sample people who are easy or comfortable to collect information from.
Voluntary Response Bias: Participants self-select; often not representative.
Undercoverage: Some groups excluded from the sample.
Non-response Bias: Selected individuals do not respond.
Response Bias: Misleading or biased questions lead to incorrect responses.
Wording of Questions: Question is worded so that a certain response is given.
Standard Deviation (σ or s):
Measures data spread from the mean.
Z-Score:
Indicates how many standard deviations x is from the mean.
Central Limit Theorem (CLT):
For large n, the sampling distribution of the mean is approximately normal.
Type I Error (α): Rejecting H0 when it’s actually true.
Type II Error (β): Failing to reject H0 when Ha is true.
Power of a Test:
Probability of correctly rejecting H0; increasing sample size improves power.
Description: Displays the frequency distribution of numerical data by grouping data into intervals (bins).
Usage: Useful for identifying the shape of a distribution (e.g., symmetrical, skewed).
Example: Showing the distribution of test scores among students.
Description: Visual representation of the 5-number summary (minimum, Q1, median, Q3, maximum).
Usage: Ideal for identifying outliers and comparing distributions.
Example: Comparing the spread of scores between two classes.
Description: Displays individual data points on a number line.
Usage: Useful for small datasets to show frequency and distribution.
Example: Representing the number of pets owned by students in a class.
Description: Splits data into stems (the leading digits) and leaves (the trailing digits).
Usage: Preserves raw data while showing distribution.
Example: Displaying exam scores (e.g., 85 → stem: 8, leaf: 5).
Description: Plots cumulative frequencies, showing the number of observations below each value.
Usage: Helpful for understanding percentiles and cumulative data.
Example: Showing the cumulative percentage of students scoring below certain thresholds on a test.
Description: A circular chart divided into sectors representing proportions.
Usage: Best for categorical data and displaying relative frequencies as percentages.
Example: Visualizing the proportion of students choosing different subjects.
Description: Uses bars to represent frequencies of categorical data.
Usage: Useful for comparing categories.
Example: Comparing the number of students in different grade levels.
Description: A bar graph divided into segments, each representing a category within a whole.
Usage: Useful for comparing the composition of different groups.
Example: Showing the distribution of favorite sports across different grades.
Description: Plots pairs of quantitative data points on a coordinate plane.
Usage: Shows relationships between two variables (correlation, trends, outliers).
Example: Examining the relationship between study hours and exam scores.
Description: Plots data points against a normal distribution to assess normality.
Usage: Determines if data is approximately normally distributed.
Example: Checking if the heights of students follow a normal distribution.
Be Specific:
Always include numerical values and context.
Example: Instead of saying "a large number of students," specify "25 out of 100 students." This clarifies the scale and ensures a precise understanding.
Define and Show Work:
Clearly define variables and show all calculations step-by-step. This not only demonstrates your understanding but can also earn partial credit for correct methods, even if the final answer is wrong.
Example: If X is the number of successes in a binomial distribution, define it before starting your calculation.
Read Carefully:
Understand the question requirements before answering. Highlight key points or terms to focus your response.
Check Your Answers:
After solving, review your solution. Ensure it fits the problem context and check for calculation errors.
Center: Use the mean or median depending on the data's skewness.
Example: The median is better for skewed data, while the mean is appropriate for symmetrical data.
Unusual Features: Identify outliers or gaps and discuss their impact.
Example: An outlier might significantly affect the mean but have little effect on the median.
Spread: Discuss variability using range, interquartile range (IQR), or standard deviation.
Example: Standard deviation measures how spread out the data is from the mean.
Shape: Describe the distribution's shape:
Symmetrical: Mean ≈ Median.
Skewed Right: Tail on the right; mean > median.
Skewed Left: Tail on the left; mean < median.
Bimodal: Two peaks in the data.
Example Scenario:
For a histogram of test scores:
Center: Mean = 75.
Spread: Standard deviation = 10.
Shape: Skewed right, indicating more low scores.
Shape: Identify overall pattern and shape of the data distribution.
Example: Symmetrical, skewed left/right, or bimodal.
Outliers: Detect any points that fall far outside the general pattern.
Example: In a dataset with test scores, a score of 100 might be an outlier if most scores are between 50 and 70.
Center: Determine the typical value, using mean or median.
Example: Median is better when there are outliers.
Spread: Measure variability using range, IQR, or standard deviation.
Example: An IQR of 20 indicates that the middle 50% of data points are spread across 20 units.
Direction: Determine if the relationship is positive, negative, or neither.
Example: A positive direction means that as xxx increases, yyy increases.
Unusual Features: Look for outliers or clusters that do not fit the general pattern.
Example: A point far from the rest of the data in a scatterplot of height vs. weight.
Form: Identify the form of the relationship (linear or nonlinear).
Example: A linear form suggests a straight-line relationship, while a curved pattern indicates nonlinearity.
Strength: Assess how tightly the points follow the form.
Example: A strong correlation means points closely follow a line.
Strength: Strong, moderate, or weak.
Example: r=0.9 suggests a strong linear relationship.
Trend: Identify if the relationship is linear or nonlinear.
Example: A curved scatterplot indicates a nonlinear trend.
Direction: Positive or negative.
Example: A positive correlation means that as one variable increases, so does the other.
B: Binary-either success or failure
I: Independent trials
N: Number of trials fixed
S: Success probability stays the same
Properties: Fixed number of trials (n), binary outcomes (success/failure), independent trials, and constant probability of success (p).
Formula:
Where
is the number of ways to choose k successes from n trials.
Example: The probability of flipping exactly 3 heads in 5 coin tosses.
B: Binary-either success or failure
I: Independent trials
T: Trials until success
S: Success probability stays the same
Properties: Trials continue until the first success, with constant probability p.
Formula:
Example: Probability of rolling a six on the third dice roll.
P: Define the parameter of interest (mean μ\muμ or proportion ppp).
A: State assumptions:
Random Sample
Normality: For means, use CLT if n≥30; for proportions, check np≥10 and n(1−p)≥10.
N: Name the interval (e.g., 1-proportion z-interval).
I: Calculate the interval:
For proportions:
For means (unknown σ):
C: Conclude in context (e.g., "We are 95% confident that the true proportion is between...").
P: Define the parameter and significance level (α).
H: State hypotheses (H0 and Ha).
A: Check assumptions (randomness, normality, independence).
N: Identify the test (e.g., 1-proportion z-test).
T: Calculate the test statistic:
For proportions:
O: Find the p-value.
M: Make a decision (p<α: Reject H0).
S: State the conclusion in context.
Slope Interpretation: For each unit increase in x, y changes by the slope value.
Example: If the slope is 2.5, then for each additional hour studied, the predicted score increases by 2.5 points.
Coefficient of Determination (R2):
Indicates the percentage of variability in y explained by x.
Example: R2= 0.75 means 75% of the variation in test scores is explained by study hours.
Residual Plots:
No pattern suggests a good linear fit.
Curves or patterns suggest a nonlinear relationship or the need for transformation.
Simple Random Sample (SRS): Equal chance for all samples.
Stratified (Random): Divide into strata, then sample from each.
Cluster: Randomly select clusters and sample all within them.
Systematic: Sample every kth individual after a random start.
Voluntary: Sample is selected in a way that people do not have to respond
Convenience: Sample people who are easy or comfortable to collect information from.
Voluntary Response Bias: Participants self-select; often not representative.
Undercoverage: Some groups excluded from the sample.
Non-response Bias: Selected individuals do not respond.
Response Bias: Misleading or biased questions lead to incorrect responses.
Wording of Questions: Question is worded so that a certain response is given.
Standard Deviation (σ or s):
Measures data spread from the mean.
Z-Score:
Indicates how many standard deviations x is from the mean.
Central Limit Theorem (CLT):
For large n, the sampling distribution of the mean is approximately normal.
Type I Error (α): Rejecting H0 when it’s actually true.
Type II Error (β): Failing to reject H0 when Ha is true.
Power of a Test:
Probability of correctly rejecting H0; increasing sample size improves power.
Description: Displays the frequency distribution of numerical data by grouping data into intervals (bins).
Usage: Useful for identifying the shape of a distribution (e.g., symmetrical, skewed).
Example: Showing the distribution of test scores among students.
Description: Visual representation of the 5-number summary (minimum, Q1, median, Q3, maximum).
Usage: Ideal for identifying outliers and comparing distributions.
Example: Comparing the spread of scores between two classes.
Description: Displays individual data points on a number line.
Usage: Useful for small datasets to show frequency and distribution.
Example: Representing the number of pets owned by students in a class.
Description: Splits data into stems (the leading digits) and leaves (the trailing digits).
Usage: Preserves raw data while showing distribution.
Example: Displaying exam scores (e.g., 85 → stem: 8, leaf: 5).
Description: Plots cumulative frequencies, showing the number of observations below each value.
Usage: Helpful for understanding percentiles and cumulative data.
Example: Showing the cumulative percentage of students scoring below certain thresholds on a test.
Description: A circular chart divided into sectors representing proportions.
Usage: Best for categorical data and displaying relative frequencies as percentages.
Example: Visualizing the proportion of students choosing different subjects.
Description: Uses bars to represent frequencies of categorical data.
Usage: Useful for comparing categories.
Example: Comparing the number of students in different grade levels.
Description: A bar graph divided into segments, each representing a category within a whole.
Usage: Useful for comparing the composition of different groups.
Example: Showing the distribution of favorite sports across different grades.
Description: Plots pairs of quantitative data points on a coordinate plane.
Usage: Shows relationships between two variables (correlation, trends, outliers).
Example: Examining the relationship between study hours and exam scores.
Description: Plots data points against a normal distribution to assess normality.
Usage: Determines if data is approximately normally distributed.
Example: Checking if the heights of students follow a normal distribution.