Exercise 271 Notes (Statistics, Chapter 27)

Back-to-back bar chart: Class sizes for California vs. Oregon

  • What the chart represents
    • A back-to-back bar chart shows two groups (California and Oregon) side by side for each class-size category, enabling comparison of class sizes across states.
    • In this exercise, Olivia sampled 50 Grade 9 classes from each state and recorded the number of students in each class. The chart displays the frequency of classes for each possible class size,Separating California and Oregon.
  • Key questions and how to approach them
    • a) How many of the selected classes from each state contain at least 25 students?
    • Define: For each state, count all classes with 25 or more students.
    • Let CA{≥25} be the number of California classes with at least 25 students, and OR{≥25} the analogous count for Oregon.
    • Read off the chart by summing the CA bars at 25, 26, 27, 28, … and similarly for OR.
    • Answer format: CA{≥25} and OR{≥25} (two numbers).
    • b) Describe the distribution of each data set.
    • For each state, describe:
      • Center: where is the bulk of the data? (rough median/central tendency impression from the stacked bars.)
      • Spread: how wide is the distribution of class sizes? (range or spread indicated by how far the bars extend.)
      • Shape: is the distribution symmetric, skewed left, skewed right, bimodal, etc., based on the heights of the bars across class-size categories?
      • Outliers: are there extremes (very small or very large class sizes) that stand apart from the main body?
      • Comparison: how does California compare to Oregon in terms of center and spread?
    • Copy and complete: "The selected classes from …… generally have larger class sizes."
    • How to determine the missing word:
      • Compare the overall heights of the California bars versus the Oregon bars across similar class-size categories.
      • If California bars tend to be higher than Oregon bars at most class-size categories, the sentence would read: "The selected classes from California generally have larger class sizes."
  • Important concepts to review from this item
    • Back-to-back bar chart usage for comparing two groups on the same categorical scale
    • Interpreting distributions from bar heights: center, spread, skew, modality
    • Reading off counts for a threshold (≥25 students)
  • Practical tips
    • When counting, group by class-size categories (e.g., 20–24, 25–29, etc.) if the chart is labeled in ranges, or read exact values per category if the chart provides exact counts.
    • Use the total of each state (sum of frequencies) to verify the sampling frame if needed (expect around 50 per state).
    • If data are incomplete in the chart, note the limitation and rely on described features (center/spread) rather than exact numbers.
  • Ethical, philosophical, and practical notes
    • Sampling design matters: random-like selection of 50 classes helps ensure representative comparisons; nonrandom selection could bias perceived differences between states.
    • Privacy and data handling: class sizes are aggregate counts and generally non-identifying, but ensure no individual data are disclosed.

Back-to-back stem-and-leaf plot: Cricket scores for a batsman in 2020 vs. 2021

  • What a back-to-back stem-and-leaf plot is
    • A back-to-back stem-and-leaf plot displays two related data sets (here, cricket scores for 2020 and 2021) with a common stem and two sides of leaves, enabling quick visual comparison of distributions.
    • The stem represents the tens (or another place-value) and the leaves represent the units for each season (one side for 2020, the other for 2021).
  • Key questions and how to approach them
    • a) In which season was the batsman more consistent?
    • Consistency is indicated by a smaller spread (lower variability) in scores.
    • Compare the spread of the two leaves across the same stems: the season whose leaves are more tightly clustered around the center (smaller range/IQR) is more consistent.
    • b) In which season did the batsman generally score more runs?
    • Compare central tendency: higher median (and often higher mean) suggests generally more runs.
    • Look at the distribution of leaves relative to the stems to infer which season centers higher.
  • Concepts to review
    • Median vs. mean as measures of central tendency
    • Spread measures: range (max–min), interquartile range (IQR), and visual spread in a stem-and-leaf plot
    • Reading back-to-back plots: side-by-side comparison of distributions for two related datasets
  • Practical steps to answer
    • Identify the median of each season from the back-to-back plot (middle value of leaves for each side).
    • Assess spread by checking the distance between the smallest and largest leaves on each side and by looking at how leaves are distributed around the median.
    • Determine which season has higher central tendency by comparing medians (and means if provided or easily inferred from the data).
  • Ethical, philosophical, and practical notes
    • When presenting comparisons, ensure no imbalance in interpretation: both central tendency and spread matter for performance evaluation.

Parallel box plot: Numbers of students on school buses A and B over a one-month period

  • What a parallel box plot shows
    • Two box plots drawn on the same axis (one for bus A, one for bus B) over the same time period, allowing comparison of distributions of the number of students per trip.
  • Questions and how to approach
    • a) On what percentage of trips were there 10 or more students on:
      • i) bus A; ii) bus B?
      • Important: Box plots summarize data; to get exact percentages you need the underlying data or the detailed counts per trip. Box plots indicate quartiles, median, min, max, and potential outliers, but not exact frequencies unless you have the raw data.
      • General approach when exact data are not available:
      • If the minimum is at or above 10, then 100% of trips meet the condition.
      • If Q1 is above 10, then at least 75% of trips meet the condition (assuming a standard interpretation where 75% fall at or above Q1). If Q3 is below 10, then less than 25% meet the condition, etc.
      • When precise values are needed, refer to the data set or the underlying counts used to construct the box plots.
    • b) Which bus generally has the higher number of students?
    • Compare the medians: the bus with the higher median generally has more students per trip.
    • Also compare the overall location (min and max) and central tendency.
    • c) On which bus is there a greater spread in the number of students? Explain your answer.
    • Compare the IQRs (the widths of the boxes) and the whiskers: the larger IQR and/or longer whiskers indicate greater spread.
  • Key concepts to review
    • Box plot components: min, Q1, median, Q3, max, possible outliers
    • Interpreting multiple box plots on the same axis for comparison
    • How to translate box-plot information into percentages or proportions when data are summarized
  • Practical tips
    • If you’re asked for a precise percentage of trips with 10 or more students, you’ll need the individual trip data or, at minimum, the frequency distribution of counts around 10. Box plots alone provide qualitative guidance about proportion but not exact values.
    • When comparing buses, a higher median indicates a generally higher number of students per trip; a larger IQR indicates more variability in bus occupancy.

Box-plot and summary statistics for chess club ratings: The Rookies vs Best Mates

  • What the data are
    • Two sets of chess ratings for members of two clubs: The Rookies and Best Mates.
    • Each club has a list of ratings (e.g., 36 observations per club in the provided data).
  • What you are asked to compute
    • a) Mean and median rating for each club
    • b) Range and interquartile range (IQR) for each club
    • d) Which club has a greater spread of player ratings?
      • Compare spreads using range and IQR; a larger value indicates greater variability.
    • e) Which club generally has higher rated players?
      • Compare central tendency measures (mean and/or median).
  • Formulas to use
    • Mean: ar{x} = rac{1}{n}
      abla \, ext{sum of all ratings} = rac{1}{n} iggl(
      abla{i=1}^{n} xiiggr)
    • Where n is the number of players in the club and x_i are the ratings.
    • Median: Sort the list and choose the middle value; if n is even, use the average of the two middle values.
    • Range: R=x<em>(n)x</em>(1)R = x<em>{(n)} - x</em>{(1)}
    • Where x{(1)} is the minimum rating and x{(n)} is the maximum rating.
    • Interquartile range (IQR): IQR=Q<em>3Q</em>1IQR = Q<em>3 - Q</em>1
    • Q1 is the first quartile (median of the lower half), Q3 is the third quartile (median of the upper half).
  • Procedural steps for each club (typical approach)
    • Step 1: Sort the ratings in ascending order.
    • Step 2: Count n (the number of ratings).
    • Step 3: Compute the mean using the formula above.
    • Step 4: Find the median: if n is odd, the middle value; if even, the average of the two middle values.
    • Step 5: Determine the range: subtract the smallest rating from the largest.
    • Step 6: Determine Q1 and Q3 to compute IQR. Use a consistent method for quartile calculation (e.g., Tukey method or a chosen convention) and apply it to both clubs for comparability.
  • How to compare the clubs
    • Spread: Compare ranges and IQRs. The club with the larger IQR and/or larger range has greater spread in ratings.
    • Center: Compare the means (and medians) to determine which club generally has higher-rated players.
  • Ethical, philosophical, and practical notes
    • Rating data are individuals but aggregated, so reporting should preserve confidentiality where needed and avoid inference beyond the data.
    • When comparing groups, be mindful of sample sizes; a much smaller club with extreme values can affect spread metrics disproportionately.
  • Examples and interpretive tips
    • If Club A has a higher median and higher mean than Club B, Club A generally has higher-rated players.
    • If Club B has a larger IQR and range than Club A, Club B shows more variability in ratings.
    • If both mean and median are higher for one club, that club is consistently stronger on average.
  • Quick reference: common statistics used in these problems
    • Central tendency: mean (average), median
    • Variability: range, interquartile range (IQR)
    • Distribution shape: symmetry, skewness, spread
    • Box plot interpretation: min, Q1, median, Q3, max; outliers
    • Stem-and-leaf plots: distribution shape and central tendency can be read directly from stems and leaves
    • Back-to-back plots: facilitate direct comparison of two related datasets
  • Connections to foundational principles and real-world relevance
    • These techniques underpin descriptive statistics, helping you summarize large data sets succinctly for decision-making.
    • The concepts of central tendency and dispersion are foundational for comparing groups (e.g., states, seasons, teams, clubs) in real-world data analyses.
    • Understanding how to read different visualization formats (back-to-back bar charts, stem-and-leaf plots, box plots) is essential for interpreting data reports in academics, sports analytics, and public data sets.
  • Summary suggestions for study
    • Memorize definitions and formulas for mean, median, range, IQR, and their interpretations.
    • Practice reading back-to-back plots, stem-and-leaf plots, and parallel box plots to quickly assess center and spread.
    • When solving problems, clearly state the statistic you are using (e.g., median vs mean) for central tendency and why you chose it to reflect the data’s characteristics.
    • Always note the limitations of summary plots: they compress data and can hide nuances present in the underlying data.