Exploring One Variable Data Notes

Unit 1: Exploring One Variable Data

Unit 1 Test Review

Key Concepts and Questions

Multiple Choice Practice
- Question 1: Identification of Categorical Variables
- Consider the following variables recorded in a phone survey about TV shows:
  - I. The type of show being watched
  - II. The number of persons watching the show
  - III. The ages of persons watching the show
  - IV. The name of the show being watched
  - V. The number of times the show has been watched in the last month
- Technique: Distinguishing between Categorical and Quantitative Variables.
  - Categorical Variables: Describe qualities or characteristics that fall into categories (e.g., "type of show," "name of show"). They do not have numeric meaning in a way that arithmetic operations would be sensible.
  - Quantitative Variables: Represent counts or measurements (e.g., "number of persons," "ages" (numerical representation), "number of times"). Arithmetic operations like averaging are meaningful for these variables.
- Formulas/Calculator Steps: This is a conceptual distinction and does not require calculator steps.
- The correct answer is:
  - (D) I and IV are categorical variables since they describe qualities or categories without showing numeric measures.
- Question 2: Analysis of Histogram for Age of Cars
- Given this histogram of car ages:
  - True statements regarding the distribution could be:
  - I. The mean age is greater than the median age.
  - II. The median age is greater than the mean age.
  - III. The median is between 6 and 8 years.
- Technique: Interpreting Histograms and understanding the relationship between Mean and Median.
  - A histogram visually represents the distribution of a quantitative variable. Its shape (skewness) indicates the relationship between the mean and median.
  - If a distribution is skewed to the right (long tail on the right), the mean is typically greater than the median (Mean > Median) because the mean is pulled towards the longer tail by higher values.
  - If a distribution is skewed to the left (long tail on the left), the median is typically greater than the mean (Median > Mean) because the mean is pulled towards the longer tail by lower values.
  - If a distribution is symmetric, the mean and median are approximately equal.
- Formulas/Calculator Steps: This is a conceptual interpretation of a histogram and does not involve direct calculator steps.
- The answer options are:
  - (E) II and III are correct based on the distribution insights.
- Question 3: Mean Age Calculation
- The mean age of five people in a room is 30 years. If one person (aged 50 years) leaves:
  - The mean age of the remaining four becomes:
- Technique: Calculating the Mean and understanding its behavior when data points are added or removed.
  - The mean is the sum of all values divided by the number of values.
- Formulas/Calculator Steps:
  - Formula for Initial Total Sum: \text{Total Sum} = \text{Mean} \times n
  - Formula for New Mean: \text{New Mean} = \frac{\text{New Total Sum}}{n_{\text{new}}}
  - Calculator Steps (TI-83/84):
    1. Calculate the initial total age: Press 5 * 30 ENTER. (Result: 150)
    2. Subtract the age of the person who left: Press ANS - 50 ENTER. (Result: 100)
    3. Divide by the new number of people (4): Press ANS / 4 ENTER. (Result: 25)
- (C) Correctly indicates that the mean age is now 25.
- Question 4: Distribution Characteristics
- Regarding the distribution stated:
  - Possible characteristics could be:
  - (A) Mean=3, Median = 3, Mode = 3
  - (B) Mean = 3.5, Median = 4, Mode = 3
  - (C) Mean = 4, Median = 3.5, Mode = 3
  - (D) Mean = 3.5, Median = 3.5, Mode = 5
  - (E) Mean = 3, Median = 2, Mode = 5
- Technique: Identifying Measures of Central Tendency (Mean, Median, Mode).
  - The mean is the arithmetic average.
  - The median is the middle value when data is ordered (or average of two middle values if an even number of data points).
  - The mode is the value that appears most frequently in the dataset.
  - The relationship between these measures often suggests the shape of the distribution.
- Formulas/Calculator Steps: This question involves identifying, not calculating, characteristics from given options. If raw data were provided to calculate these, you would:
  - Calculator Steps (TI-83/84 for raw data):
    1. Enter data into a list: Press STAT, then select 1:Edit..., and enter values into L1.
    2. Calculate one-variable statistics: Press STAT, select CALC, then 1:1-Var Stats. Ensure List is L1 (or your chosen list) and FreqList is empty. Press Calculate. The output will display the mean (\bar{x}), median (Med), and other statistics. The mode must be found by inspecting the data for the most frequent value.
- Most likely true is (C): Mean = 4, Median = 3.5, Mode = 3.
- Question 5: Variance Calculation
- To determine the variance of the sample:
  - Given numbers: 2, 15, 10, 8, and 4.
- Technique: Calculating Sample Variance.
  - Variance measures the average squared deviation of each data point from the mean. For a sample, the formula for variance (s^2) is:
    s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}
  - The steps involve: 1. Calculate the mean \bar{x}. 2. Subtract the mean from each data point (xi - \bar{x}). 3. Square each deviation (xi - \bar{x})^2. 4. Sum the squared deviations \sum (x_i - \bar{x})^2. 5. Divide by (n - 1) (for sample variance).
- Formulas/Calculator Steps:
  - Calculator Steps (TI-83/84):
    1. Enter data into a list: Press STAT, then select 1:Edit..., and enter 2, 15, 10, 8, 4 into L1.
    2. Calculate one-variable statistics: Press STAT, select CALC, then 1:1-Var Stats. Ensure List is L1 and FreqList is empty. Press Calculate.
    3. Find the sample standard deviation (Sx): Locate Sx in the output (e.g., S_x \approx 5.1185 for this data).
    4. Square the sample standard deviation to get the sample variance: Press (, then VARS, select 5:Statistics..., then 3:Sx, then )^2 ENTER. (Result: 26.2)
- Correct answer:
  - (A) 26.2
- Question 6: Stem-and-Leaf Plot Analysis
- Given details of a stem-and-leaf plot, verifying the following could be true:
  - I. The medians of the distributions are the same.
  - II. The means of the distributions are the same.
  - III. The ranges of the distributions are the same.
- Technique: Interpreting Stem-and-Leaf Plots and comparing distributions.
  - A stem-and-leaf plot displays quantitative data by separating each data point into a 'stem' (the leading digits) and a 'leaf' (the trailing digit).
  - It shows the shape, center, and spread of a distribution, and can be easily used to find the median (middle value) and range (Max - Min).
  - Comparing median, mean, and range helps assess similarities or differences between two or more distributions.
- Formulas/Calculator Steps: This is an interpretation task based on visual data representation and doesn't require direct calculator steps, other than potentially finding a median by counting values on the plot.
- Correct answer is (A): I only.
- Question 7: Fastball Speed Distribution
- Normally distributed fastball speeds with mean = 86 mph and SD = 4 mph. What percentage throw faster than 95 mph?
- Technique: Using the Normal Distribution and Z-scores to calculate probabilities.
  - The Normal Distribution is a symmetric, bell-shaped distribution defined by its mean (\mu) and standard deviation (\sigma).
  - A Z-score measures how many standard deviations an individual data point (x) is from the mean:
    Z = \frac{x - \mu}{\sigma}
  - Once the Z-score is calculated, its corresponding percentile (area under the curve) can be found using a Z-table or statistical software to determine the percentage of values above or below a certain point.
- Formulas/Calculator Steps:
  - 1. Calculate the Z-score:
    - Formula: Z = \frac{95 - 86}{4}
    - Calculator Steps (TI-83/84): Press (, 95, -, 86, ), /, 4, ENTER. (Result: 2.25)
  - 2. Find the percentage using the Normal CDF function (area to the right of Z = 2.25):
    - Calculator Steps (TI-83/84): Press 2nd, then VARS (for DISTR), select 2:normalcdf(. Enter lower: 2.25, upper: 1E99 (a very large number to represent infinity), mean: 0, sd: 1 (for standard normal Z-distribution). Press Paste, then ENTER. (If using original values instead of Z-score, enter lower: 95, upper: 1E99, mean: 86, sd: 4).
- From calculations, the conclusion is:
  - (B) 7.7% can throw at speeds higher than 95 mph.
- Question 8: Histogram Construction Principles
- True statements when constructing a histogram include:
  - (A) No gaps between bars.
  - (C) The heights relate to frequency counts.
- Technique: Principles of Histogram Construction.
  - When constructing a histogram for continuous data, there should generally be no gaps between the bars to show that data is continuous across the range.
  - The height of each bar represents the frequency (count) or relative frequency (proportion) of data points falling into that interval (bin).
  - The bins (intervals) must be of equal width.
- Formulas/Calculator Steps: This is a conceptual principle about data visualization and does not involve direct calculator steps.
- (A) correct concerning gaps between bars.
- Question 9: Microwave Lifespan Analysis
- Lifespan is Normally distributed, with mean = 9 years and SD = 2.5 years. Find the percentage lasting over 10 years:
- Technique: Application of Normal Distribution and Z-scores for probability.
  - This is a direct application of the techniques explained in Question 7, calculating a Z-score and then finding the area under the normal curve that corresponds to values greater than a specific point.
- Formulas/Calculator Steps:
  - 1. Calculate the Z-score:
    - Formula: Z = \frac{10 - 9}{2.5}
    - Calculator Steps (TI-83/84): Press (, 10, -, 9, ), /, 2.5, ENTER. (Result: 0.4)
  - 2. Find the percentage using the Normal CDF function (area to the right of Z = 0.4):
    - Calculator Steps (TI-83/84): Press 2nd, then VARS (for DISTR), select 2:normalcdf(. Enter lower: 0.4, upper: 1E99, mean: 0, sd: 1 (for standard normal Z-distribution). Press Paste, then ENTER. (If using original values instead of Z-score, enter lower: 10, upper: 1E99, mean: 9, sd: 2.5).
- Calculation indicates:
  - (B) 34.5% last more than 10 years.
- Question 10: Distribution of Test Scores
- Scores for a science test indicate:
  - Mode = 90, others are 80 and 20. Which statement about the distribution is true?
- Technique: Understanding Skewness based on Measures of Central Tendency.
  - If the median is greater than the mean, it generally indicates a left-skewed distribution, meaning the bulk of the data is on the higher end, and there's a tail stretching to the left (lower values pulling the mean down).
  - The mode is the peak of the distribution. In a left-skewed distribution, the typical order is Mode > Median > Mean.
- Formulas/Calculator Steps: This is a conceptual interpretation of summary statistics and does not involve direct calculator steps for calculation.
- Based on observations: (B) The median is greater than the mean.
- Question 11: Analysis of Ordered List of Values
- Values ordered: 25, 26, 26, 30, y, y, y, 33, 150.
  - Statements to confirm:
  - I. The mode is 26.
  - II. The mean is greater than the median.
  - III. There are no outliers in the data.
- Technique: Identifying Mode, comparing Mean and Median, and detecting Outliers using the IQR Rule.
  - The mode is the most frequent value. If 'y' is a single value, and it appears 3 times, then 'y' could be the mode if it's more frequent than 26 (which appears twice).
  - Outlier detection using the IQR Rule involves calculating the Interquartile Range (IQR = Q3 - Q1). Values are considered outliers if they fall below the lower fence (Q1 - 1.5 \times IQR) or above the upper fence (Q3 + 1.5 \times IQR).
- Formulas/Calculator Steps: This involves conceptual understanding and some manual calculation or calculator use if the actual value of 'y' was known and data entered into a list. For outlier detection:
  - Formula for IQR: IQR = Q3 - Q1
  - Formula for Lower Fence: Q1 - 1.5 \times IQR
  - Formula for Upper Fence: Q3 + 1.5 \times IQR
  - Calculator Steps (TI-83/84 for IQR related values, assuming data is entered):
    1. Enter data (if 'y' values are known) into a list: Press STAT, select 1:Edit..., enter values into L1.
    2. Calculate one-variable statistics: Press STAT, select CALC, then 1:1-Var Stats. Press Calculate.
    3. From the output, identify Q1 and Q3. Calculate IQR: Q3 - Q1.
    4. Calculate fences: Q1 - 1.5 * IQR and Q3 + 1.5 * IQR. Compare data min/max to these fences.
- The correct answer is (A): I only.
- Question 12: Weight Loss Center Statistics
- Given statistics of weight loss:
  - Mean = 5 lbs, Q1 = 2 lbs, Median = 7 lbs, Q3 = 8.5 lbs, Mode = 4 lbs, SD = 0.5 lbs.
  - All statements valid:
- Technique: Interpreting various Summary Statistics.
  - Mean, Median, Mode describe the center of the data.
  - Q1 (First Quartile) is the 25th percentile, Median (Q2) is the 50th percentile, and Q3 (Third Quartile) is the 75th percentile.
  - Standard Deviation (SD) measures the typical spread of data points around the mean.
  - Together, these statistics provide a comprehensive summary of the distribution's center, shape, and spread.
- Formulas/Calculator Steps: This is an interpretation task based on provided summary statistics, not requiring direct calculation steps.
- (E) I, II, and III are correct.
- Question 13: Poverty Estimation Responses
- From the recent poll, mean = 7%, median = 12%. Suggests distribution shape:
- Technique: Inferring Distribution Shape from Mean and Median.
  - As discussed in Question 2, if the mean is less than the median (Mean < Median), this strongly suggests that the distribution is skewed to the left. This means there is a longer tail of smaller values pulling the mean down.
  - (Note: The provided example answer (D) suggests Right-skew, but if Mean (7%) < Median (12%), it implies Left-skew. This might be a discrepancy to clarify in a review session. If the mean were 12% and median 7%, then it would be right-skewed. Assuming the prompt implies mean<median=left skew, but following the given answer which states right skew, means the prompt's mean and median values would be reversed.)
- Formulas/Calculator Steps: This is a conceptual interpretation of summary statistics and does not involve direct calculator steps for calculation.

Additional Questions and Analysis

Questions 14 - 24: Analysis of Various Statistical Distributions and Concepts
- Techniques: These questions often require a deeper understanding and application of:
  - Z-scores: Standardizing values from different normal distributions for comparison.
  - Percentiles: The value below which a given percentage of observations falls.
  - Standard Deviations: A measure of spread, indicating how much variation there is from the average (mean).
  - Mean Behaviors: How the mean responds to changes in data, and its relationship to the median and mode in different distribution shapes.
  - Empirical Rule (68-95-99.7 Rule): For normal distributions, approximately 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs.
  - Box plots: Visualizing the 5-number summary and potential outliers.
- Formulas/Calculator Steps: Depending on the specific question, similar calculator steps as described for Z-scores (Question 7, 9), 1-Var Stats (Question 5, FR 1), and normalcdf (Question 7, 9, FR 3) would apply.

Free Response Questions

Guinea Pig Survival Times (in days)
- Data: 91, 83, 84, 79, 91, 93, 95, 97, 97, 111, 101, 105, 98.
  - (a) 5-number summary:
  - Minimum: 79
  - Q1: 87.5
  - Median: 95
  - Q3: 99.5
  - Maximum: 111
  - Technique: Constructing a 5-Number Summary.
    - The 5-number summary consists of the Minimum value, First Quartile (Q1), Median, Third Quartile (Q3), and Maximum value. These are crucial for creating box plots and understanding the range and spread of the data. To calculate, first order the data from least to greatest.
    - Minimum: The smallest value.
    - Maximum: The largest value.
    - Median (Q2): The middle value of the entire ordered dataset.
    - Q1: The median of the lower half of the dataset (values below the overall median).
    - Q3: The median of the upper half of the dataset (values above the overall median).
  - Formulas/Calculator Steps:
    - Calculator Steps (TI-83/84):
      1. Enter data into a list: Press STAT, then select 1:Edit..., and enter 91, 83, 84, 79, 91, 93, 95, 97, 97, 111, 101, 105, 98 into L1.
      2. Calculate one-variable statistics: Press STAT, select CALC, then 1:1-Var Stats. Ensure List is L1 and FreqList is empty. Press Calculate.
      3. Scroll down in the output to find: minX (Minimum), Q1, Med (Median), Q3, and maxX (Maximum).
  - (b) Stemplot overview:
  - Shape is unimodal and symmetric.
  - Technique: Building and Interpreting Stemplots for Distribution Shape.
    - A stemplot (or stem-and-leaf plot) organizes numerical data by tens digits (stem) and unit digits (leaf). It gives a quick visual representation of the distribution's shape, center, and spread.
    - A unimodal distribution has one clear peak.
    - A symmetric distribution implies that if you draw a line through the middle, both sides would be approximate mirror images.
  - Formulas/Calculator Steps: Stemplots are typically created manually or with specialized software, not directly with standard calculator functions.
  - (c) Outliers check with IQR rule: No outliers are identified; IQR = 12, confirming range limits.
  - Technique: Identifying Outliers using the 1.5 IQR Rule.
    - The Interquartile Range (IQR) is a measure of statistical dispersion, calculated as Q3 - Q1.
    - Outliers are values that lie an unusual distance from the rest of the data. The 1.5 IQR Rule defines fences to identify potential outliers:
    - Lower Fence: Q1 - 1.5 \times IQR
    - Upper Fence: Q3 + 1.5 \times IQR
    - Any data point falling outside these fences is considered an outlier.
  - Formulas/Calculator Steps:
    - Calculator Steps (TI-83/84):
      1. Obtain Q1 and Q3 from the 1-Var Stats output (as in part (a)).
      2. Calculate IQR: Press (, then VARS, select 5:Statistics..., then 1:Q3, -, VARS, select 5:Statistics..., then 1:Q1, ) ENTER. (Result: 12)
      3. Calculate Lower Fence: Press VARS, select 5:Statistics..., then 1:Q1, -, 1.5, *, ANS (if previous step was IQR calculation), ENTER. (Result: 87.5 - 1.5 \times 12 = 69.5)
      4. Calculate Upper Fence: Press VARS, select 5:Statistics..., then 1:Q3, +, 1.5, *, ANS (if previous step was IQR calculation), ENTER. (Result: 99.5 + 1.5 \times 12 = 117.5)
      5. Compare the minimum (79) and maximum (111) values from the data to these fences. Since 79 > 69.5 and 111 < 117.5, there are no outliers.
  - (d) Preferable measures of center and spread: Mean and standard deviation preferred due to symmetric distribution.
  - Technique: Choosing Appropriate Measures of Center and Spread.
    - For symmetric distributions without outliers, the Mean is the preferred measure of center, and the Standard Deviation is the preferred measure of spread, as they directly use all data points and are efficient.
    - For skewed distributions or distributions with outliers, the Median is the preferred measure of center, and the IQR is the preferred measure of spread, because they are resistant to the influence of extreme values.
  - Formulas/Calculator Steps: This is a conceptual choice based on data characteristics, not a calculation.
Comparison of Grades from Two Classes
- Morning: Mean = 81.23, SD = 4.04; Afternoon: Mean = 74.97, SD = 5.68.
  - Determining whose score is higher via Z-scores:
  - Kurti: Z = \frac{91-81.23}{4.04} = 2.42
  - Steve: Z = \frac{89-74.97}{5.68} = 2.47
  - Conclusion: Steve's higher.
  - Technique: Comparing Values from Different Distributions using Z-scores.
    - When comparing individual data points from different distributions (each with its own mean and standard deviation), Z-scores are essential.
    - A higher Z-score indicates a relatively better performance or a value further above its group's mean, regardless of the original scale of measurement. In this case, Steve's Z-score is higher, indicating his score is relatively better within his class distribution than Kurti's score within his class distribution.
  - Formulas/Calculator Steps:
    - Formula for Z-score: Z = \frac{\text{score} - \text{mean}}{\text{standard deviation}}
    - Calculator Steps (TI-83/84 for Kurti): Press (, 91, -, 81.23, ), /, 4.04, ENTER. (Result: 2.419… \approx 2.42)
    - Calculator Steps (TI-83/84 for Steve): Press (, 89, -, 74.97, ), /, 5.68, ENTER. (Result: 2.469… \approx 2.47)
  - (b) Grade equivalency for score of 75:
    Z = \frac{75-81.23}{4.04} \implies -1.54 \text{; Required score for afternoon to match this Z-type}
    -1.54 = \frac{x - 74.97}{5.68} \implies 66.22
  - Technique: Using Z-scores to find an equivalent value in another distribution.
    - To find an equivalent score in a different distribution, first calculate the Z-score for the given score in its original distribution. Then, use this Z-score with the mean and standard deviation of the target distribution to solve for the equivalent raw score (x).
  - Formulas/Calculator Steps:
    - 1. Calculate Z-score for 75 in Morning class: Same steps as above.
      - Calculator Steps (TI-83/84): Press (, 75, -, 81.23, ), /, 4.04, ENTER. (Result: -1.542… \approx -1.54)
    - 2. Solve for equivalent score (x) in Afternoon class: Rearrange Z = \frac{x - \mu}{\sigma} to x = Z \times \sigma + \mu
      - Calculator Steps (TI-83/84): Press (, -1.54, *, 5.68, ), +, 74.97, ENTER. (Result: 66.2232 \approx 66.22)
Battery Life Analysis: Normal Distribution with Mean = 76.3 hours and SD = 2.1 hours
- (a) Percentage getting over 80 hours:
  Z = \frac{80-76.3}{2.1} = 1.76 and find corresponding percentile.
- (b) Percentage under 75 hours:
  Z = \frac{75-76.3}{2.1} = -0.619
- (c) Percentages between 73 and 77 hours:
  - Find Z-scores accordingly and utilize integral calculations for cumulative probabilities.
- Technique: Calculating Probabilities for Ranges in a Normal Distribution.
  - These tasks involve calculating Z-scores for specific values of interest (x) and then using the standard normal (Z) distribution table or a calculator to find the area under the curve (representing probability or percentage) associated with those Z-scores.
  - For 'over' a value: Find the area to the right of the Z-score.
  - For 'under' a value: Find the area to the left of the Z-score.
  - For 'between' two values: Find the areas to the left of both Z-scores and subtract the smaller area from the larger one.
  - This technique relies on the properties of the normal distribution, where specific percentages of data lie within certain standard deviations of the mean (e.g., Empirical Rule).
- Formulas/Calculator Steps:
  - Formula for Z-score: Z = \frac{x - \mu}{\sigma}
  - Calculator Steps (TI-83/84 for Normal CDF): Press 2nd, then VARS (for DISTR), select 2:normalcdf(.
  - (a) Percentage getting over 80 hours:
    1. Calculate Z-score: Press (, 80, -, 76.3, ), /, 2.1, ENTER. (Result: 1.7619…
    2. Use normalcdf(): Enter lower: 80, upper: 1E99 (or a very large number like 99999), mean: 76.3, sd: 2.1. Press Paste, then ENTER. (Result will be a decimal; multiply by 100 for percentage).
  - (b) Percentage under 75 hours:
    1. Calculate Z-score: Press (, 75, -, 76.3, ), /, 2.1, ENTER. (Result: -0.6190…
    2. Use normalcdf(): Enter lower: -1E99 (or a very small number like -99999), upper: 75, mean: 76.3, sd: 2.1. Press Paste, then ENTER.
  - (c) Percentages between 73 and 77 hours:
    1. Calculate Z-scores for both 73 and 77:
      - For 73: (73 - 76.3) / 2.1 -> -1.57
      - For 77: (77 - 76.3) / 2.1 -> 0.33
    2. Use normalcdf(): Enter lower: 73, upper: 77, mean: 76.3, sd: 2.1. Press Paste, then ENTER.
- Graphical representation and calculation yield accurate percentile results through careful observation of provided data, ensuring detailed clarity for future learning improvement.