AP Statistics Comprehensive Study Guide: 2013-2016 Free Response Analysis

Analysis of Tip Distributions (2016 AP® Statistics Question 1)

Context and Data Set: Robin, a server in a small restaurant, collected data on her tip amounts for a single day of work. The sample size ( $n$ ) consists of $60$ tip amounts. A histogram of these amounts is provided.
Describing the Distribution (Part A): * Shape: The distribution of tip amounts is strongly right-skewed (positively skewed). * Clustering: The vast majority of the tip amounts ( $78\%$ of the data) are clustered between the values of $\$0.00$ and $\$4.99$ . * Outliers: There is one possible high outlier located in the interval between $\$20.00$ and $\$22.50$ . * Center: The median tip amount is located in the interval of $\$2.50$ to $\$4.99$ . Specifically, the median position is calculated as $\frac{n+1}{2} = \frac{60+1}{2} = 30.5$ , representing the average of the 30th and 31st values. The mean is expected to be higher than the median because it is pulled toward the higher values in the right tail. * Spread: The range of the tip amounts is approximately $\$22.49$ . * Percentiles: * The 25th percentile ( $.25 \times 60 = 15\text{th value}$ ) is in the range of $\$0.00 - \$2.49$ . * The 75th percentile ( $.75 \times 60 = 45\text{th value}$ ) is in the range of $\$2.50 - \$4.99$ .
Impact of Data Modification (Part B): * Scenario: Changing a single tip amount of $\$8$ to $\$18$ . * Effect on Mean: The mean would increase. The mean is not resistant to extreme values. Changing an $\$8$ tip to $\$18$ increases the sum of all tips by $\$10$ . With $n = 60$ , the mean increases by $\frac{10}{60} \approx 0.17$ cents ( $\$0.166…$ ). * Effect on Median: The median would not change. The median is resistant to extreme values. Since the original tip ( $\$8$ ) and the new tip ( $\$18$ ) are both greater than the current median ( $\$2.50 - \$4.99$ ), the relative position of the 30th and 31st values in the ordered list remains the same.

Experimental Design and Chi-Square Analysis: TV Advertising (2016 AP® Statistics Question 2)

Background: Product advertisers tested the effects of 30-second television ads for two new snacks: "Choco-Zuties" (sugary) and "Apple-Zuties" (healthy).
Experimental Procedure: * Total subjects: $75$ children. * Random assignment: Children were split into three groups of $25$ : Group A, Group B, and Group C. * Condition: All watched a 30-minute program with 5 minutes of ads. * Group A: Included Choco-Zuties ad (no Apple-Zuties ad). * Group B: Included Apple-Zuties ad (no Choco-Zuties ad). * Group C (Control): Included neither snack ad.
Association Analysis (Part A): * Statistical Test: Chi-square test for association/independence. * Hypotheses: * $H_0$ : Children's choice of snack and the type of ad are independent (no association). * $H_a$ : Children's choice of snack is dependent on the type of ad (significant association). * Conditions: 1. Random assignment was used. 2. All expected counts are $\ge 5$ . 3. Independence of observations. * Test Statistic calculation: * $\chi^2 = \sum \frac{(O - E)^2}{E}$ . * Expected values ( $E$ ) were calculated based on marginal totals (e.g., $18.67$ and $6.33$ ). * $\chi^2 \approx 10.2914$ . * Degrees of Freedom: $df = (r - 1)(c - 1) = (3 - 1)(2 - 1) = 2$ . * P-value: From Table C, with $df = 2$ , the P-value is 0.005 < P < 0.01 (Exact P-value $\approx 0.0058$ ). * Decision: Since the P-value ( $0.0058$ ) is less than $\alpha = 0.05$ , we reject $H_0$ . There is convincing evidence of an association between snack choice and the type of ad.
Detailed Effect of Advertising (Part B): * Choco-Zuties Ad: In the control group (Group C), $88\%$ ( $22/25$ ) chose Choco-Zuties. In Group A (exposed to the ad), $84\%$ ( $21/25$ ) chose them. The ad had little impact because the baseline preference for sugary snacks was already very high. * Apple-Zuties Ad: In the control group, only $12\%$ ( $3/25$ ) chose Apple-Zuties. In Group B (exposed to the ad), $48\%$ ( $12/25$ ) chose them. This is a significant jump from $12\%$ to $48\%$ , suggesting the Apple-Zuties advertising was the primary driver of the association found in part (a).

Probability and Simulation: Convention Attendees (2014 AP® Statistics Question 2)

Context: A company selects 3 people at random from 9 sales representatives (6 men and 3 women). The 3 selected individuals were all women, raising concerns of bias.
Exact Probability Calculation (Part A): * The total number of possible groups of 3 from 9 is $\binom{9}{3} = 84$ . * The number of ways to choose 3 women from 3 is $\binom{3}{3} = 1$ . * $P(\text{Selecting 3 Women}) = \frac{\binom{3}{3}}{\binom{9}{3}} = \frac{1}{84} \approx 0.0119$ . * Alternative calculation: $\frac{3}{9} \times \frac{2}{8} \times \frac{1}{7} = \frac{6}{504} \approx 0.0119$ .
Reason to Doubt Randomness (Part B): Yes, there is reason to doubt the manager's claim. If selection were truly random, the probability of selecting only women is merely $1.19\%$ . This is very unlikely to occur by chance alone.
Simulation Critique (Part C): * Proposed Simulation: Roll three fair 6-sided dice. Rolling a 1, 2, 3, or 4 (prob = $\frac{2}{3}$ ) represents a man. Rolling a 5 or 6 (prob = $\frac{1}{3}$ ) represents a woman. * Validity Answer: No, the simulation is incorrect. * Justification: The simulation treats each selection as an independent event (sampling with replacement), where the probability of a woman remains $1/3$ for every die roll. In reality, selection is done without replacement (dependent events). Once one woman is selected, the probability of selecting another woman decreases (from $3/9 \approx 0.33$ to $2/8 = 0.25$ ). The simulation overestimates the probability ( $0.0370$ vs. $0.0119$ ).

Normal Distribution and Sample Means: High School Attendance (2014 AP® Statistics Question 3)

Scenario: State funding is based on attendance. High School A absences are normally distributed: $X \sim N(120, 10.5^2)$ .
Individual Day Risk (Part A): Probability of losing funding (> 140 absences). * $z = \frac{140 - 120}{10.5} \approx 1.9048$ . * P(X > 140) = P(Z > 1.90) = 1 - 0.9713 = 0.0287. * The school has a $2.87\%$ chance of losing funding on a single randomly selected day.
3-Day Average Proposal (Part B): Use the mean of 3 days ( $\bar{X}$ ) to determine funding. * Standard deviation of the sample mean: $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{10.5}{\sqrt{3}} \approx 6.0622$ . * $z = \frac{140 - 120}{6.0622} \approx 3.2991$ . * P(\bar{X} > 140) = P(Z > 3.30) = 1 - 0.9995 = 0.0005. * Conclusion: Less likely ( $0.05\%$ chance vs $2.87\%$ ). Sample means are less variable than individual data points, making extreme values like $140$ much more improbable.
Conditional Selection (Part C): Probability that across 3 typical weeks, none of the 3 randomly selected days (one per week) is a Tuesday, Wednesday, or Thursday. * Total school days in a week = 5 (Mon, Tue, Wed, Thu, Fri). * Target days = Monday and Friday (2 days). * Events are independent across weeks: $P = \frac{2}{5} \times \frac{2}{5} \times \frac{2}{5} = \frac{8}{125} = 0.064$ ( $6.4\%$ ).

Probability and Expected Value: Women's Tennis (2013 AP® Statistics Question 3)

System: Match is best of 3 sets. First player to win 2 sets wins match.
Possible Sequences (Part A): * Player V wins: VV, VMV, MVV. * Player M wins: MM, MVM, VMM.
Probability of V Winning Match (Part B): * Match start: $P(V_1) = 0.5$ ; $P(M_1) = 0.5$ . * If current set win ( $V_k$ ), next set win prob ( $V_{k+1}$ ) = $0.6$ . * If current set loss ( $M_k$ ), next set win prob ( $V_{k+1}$ ) = $1 - 0.7 = 0.3$ . * If match goes to set 3: $P(V_3) = 0.45$ . * Calculations: * $P(VV) = (0.5)(0.6) = 0.3$ . * $P(VMV) = (0.5)(1 - 0.6)(0.45) = (0.5)(0.4)(0.45) = 0.09$ . * $P(MVV) = (0.5)(0.3)(0.45) = 0.0675$ . * Total $P(\text{V wins}) = 0.3 + 0.09 + 0.0675 = 0.4575$ .
Conditional Probability (Part C): Probability match consists of 3 sets given V wins. * $P(3\text{ sets} \cap \text{V wins}) = P(VMV) + P(MVV) = 0.09 + 0.0675 = 0.1575$ . * $P(3\text{ sets} | \text{V wins}) = \frac{0.1575}{0.4575} \approx 0.3443$ .
Expected Number of Sets (Part D): * Let $X$ be number of sets. * $P(X=2) = P(VV) + P(MM) = 0.3 + (0.5)(0.7) = 0.3 + 0.35 = 0.65$ . * $P(X=3) = P(VMV) + P(MVV) + P(VMM) + P(MVM) = 0.09 + 0.0675 + (0.5)(1-0.6)(1-0.45) + (0.5)(0.4)(1-0.45)$ . Since $P(X=2) + P(X=3) = 1$ , $P(X=3) = 1 - 0.65 = 0.35$ . * $E[X] = \mu_x = 2(0.65) + 3(0.35) = 1.3 + 1.05 = 2.35\text{ sets}$ .

Sampling and Quality Control: Tortilla Production (2015 AP® Statistics Question 6)

Production Context: * Line A: $X_A \sim N(5.9, \sigma^2)$ . * Line B: $X_B \sim N(6.1, \sigma^2)$ . * Combined advertisement: Mean diameter of 6 inches. Total daily production: $200,000$ .
Sampling Method Evaluation (Part A): * Method 2 (Randomly select one line, sample 200) is NOT representative of the whole day's production. It only captures one of the two sub-populations. If Line A is chosen, the mean is systematically underestimated ( $5.9$ vs. goal $6.0$ ); if Line B is chosen, it is overestimated ( $6.1$ vs. $6.0$ ).
Histogram Analysis (Part B): * The provided histogram is bimodal. This shape is characteristic of Method 1 (sampling from the whole population), as it captures the normal distributions of both Line A (left peak) and Line B (right peak).
Variability Comparison (Part C): * Method 2 results in less variability within a single sample of 200 tortillas because all items come from a single normal distribution. Method 1 includes the spread of two different distributions with different means, leading to higher overall variability in diameters.
Sampling Distribution (Part D): Using Method 1 ( $n=200$ ). * Shape: Approximately Normal due to Central Limit Theorem (n=200 > 30). * Center: $\mu_{\bar{x}} = 6.0\text{ inches}$ . * Spread: $\sigma_{\bar{x}} = \frac{0.11}{\sqrt{200}} \approx 0.0078\text{ inches}$ .
Long-term Stability (Part E): * Method 1 will result in less variability in the distribution of the 365 daily sample means. Every Day 1 sample mean will be close to $6.0$ . However, Method 2 sample means will alternate between values near $5.9$ and values near $6.1$ depending on the line picked, creating high variability across the year.
Accuracy for One Day (Part F): * Method 1 is more likely to produce a sample mean close to $6.0$ on a specific day (e.g., June 22). Method 2 is biased on any given day, guaranteed to either over- or underestimate the true daily mean of $6.0$ by selecting only one production line.