Stem and Leaf displays
Stem and Leaf Display Overview
Definition: A stem and leaf display is a graphical method of displaying data that is particularly useful when the dataset is not overly large.
Purpose: It provides an intuitive way to visualize the distribution, shape, and individual values of a dataset.
Construction and Interpretation of a Stem and Leaf Display
Initial Example: The example used is based on the number of touchdown passes thrown by the 31 NFL teams in the 2000 season.
Display Components:
Stems: The left portion contains the stems which represent the tens digits. These are arranged in a column on the left. In this example, the stems are:
3 (representing 30-39)
2 (representing 20-29)
1 (representing 10-19)
0 (representing 0-9)
Leaves: The leaves on the right side of the stems represent the ones digits, contributing to identifying the exact values in the dataset.
Example Interpretation:
Top Row: Stem of 3 corresponds to leaves 2, 3, 3, 7. Thus,
Values represented are 32, 33, 33, 37 touchdowns for the first four teams.
Second Row: Stem of 2 with 12 leaves representing:
2 occurrences of 20 touchdowns
3 occurrences of 21 touchdowns
3 occurrences of 22 touchdowns
1 occurrence of 23 touchdowns
2 occurrences of 28 touchdowns
1 occurrence of 29 touchdowns.
Third Row: To be interpreted by students as a task.
Fourth Row: Stem of 0 with leaves 9 and 6 representing the last two entries (09 and 06 touchdowns).
Key Observations:
A stem and leaf display clarifies the shape of data distributions effectively.
It allows viewers to readily identify ranges, trends, and counts.
Example conclusions drawn include:
Most teams scored between 10 and 29 passing touchdowns, with fewer teams having scores higher or lower than this range.
Advanced Construction Techniques
Splitting Stems:
This technique is employed to make clearer graphs when single stems contain multiple values.
The enhanced display divides the figures into smaller segments.
Example: The range 35-39 is shown separately to highlight specific data points.
Effectiveness: Splitting stems can lead to more intelligible displays as it prevents excessive data from being lumped into one category.
Back-to-Back Stem and Leaf Displays:
This variation allows for comparison between two distributions by placing them along a common column of stems.
Example Used: Comparing touchdown passes from 1998 and 2000 seasons.
Each stem serves as the reference with leaves on either side showing historical data.
Specific observations can be drawn, detailing performance changes between the seasons.
Practical Considerations for Data Representation
Characteristics of Data for Suitable Stem and Leaf Displays:
Whole numbers are preferred, ideally allowing representation with one-digit stems and leaves.
All values should be positive to maintain the format.
If decimal points or large numbers are present, data should be suitably rounded (to two-digit accuracy preferred).
Additional Examples
Data on Aggressive Thinking:
Context: Study on the speed of naming aggressive words when primed by either a weapon or non-weapon word.
Result Interpretation:
Positive differences indicate faster pronunciation with weapon words.
Negative differences indicate slower pronunciation.
Example Values: Range from 43.2 milliseconds faster to -27.4 milliseconds slower, illustrating the apparent differences in speed.
Negative and Zero Handling:
Examples derived from the aggressive thinking study utilize negative stems for interpreting negative values.
Special zero handling allows value distributions to be depicted correctly between zero and negative numbers:
Zero stem for numbers 0-9.
Negative zero stem for numbers between 0 and -9.
Limitations and Applications of Stem and Leaf Displays
Data Size:
Optimal for datasets of up to 200 observations.
Population Dataset Example:
Observations of populations from 185 US cities in 1998, rounded to the nearest 10,000 residents, plotted appropriately.
Judgment in Graphing Choice:
Assessing whether the dataset can be aptly represented in stem and leaf format is crucial. Some datasets may lose important details if rounded excessively.
The effectiveness of a statistical representation relies on good judgment and understanding individual datasets' nature.