Notes on Pie Charts, Bar/Pareto/Pie Visuals, Time Series, and Stem-and-Leaf Plots
Pie charts and circle graphs
- A pie chart (circle graph) is a circular chart divided into sectors. Each sector’s size is proportional to a category’s frequency or percentage of the total.
- How to construct by hand:
- Compute the frequencies or percentages for each category.
- Divide the circle into segments whose central angles are proportional to those percentages.
- The sum of all percentages is 100%. If needed, compute angles with
θ<em>k=360∘⋅Nf</em>k
where $f_k$ is the frequency (or percentage) of category $k$ and $N$ is the total.
- Key interpretation: the larger the sector, the larger the share of that category.
- Visual considerations: colors and labels help readability; hand-drawn pies look rough, while software (e.g., SPSS) produces cleaner, color-customizable charts.
- Relationship to other charts: pie charts summarize categorical distributions; bar charts show the same data with bars; Pareto charts order categories by frequency.
- Data context: survey with $N = 736$ respondents; question about social media impact on daily life. Categories discussed include a positive effect, negative effect, and remaining/other.
- Given percentages (as described):
- Positive effect: $30.6\%$ of respondents.
- Negative effect: $24.3\%$.
- Remaining/other: $45.1\%$.
- Calculation for the positive category (to illustrate how percentages arise from frequencies):
Positive percent=736225×100=30.6%, where $225$ is the frequency counted for the positive category. - Angles for the pie chart: using percentages, the wedge angles would be approximately:
- Positive: θpos=360∘⋅0.306≈110.16∘
- Negative: θneg=360∘⋅0.243≈87.48∘
- Remaining: θrem=360∘⋅0.451≈162.36∘
- Interpretation guidance: 30.6% is a bit more than a quarter of the circle; 24.3% is a bit less than a quarter; 45.1% is a bit less than half.
- How this would be presented in a report or class: a pie chart showing the three sectors with appropriate labels and color distinctions; mention that exact counts are 225 for the positive category (out of 736), with the other two categories occupying the remaining portions.
- Practical note: in assignments, SPSS can produce refined pie charts with color and labeling; hand-drawn circles are usually less precise.
Example 2: Retailer dataset (last 50 customers)
- Dataset context: last 50 customers; categories correspond to credit card type used; counts given (interpreted from the transcript): 11, 23, 9, 7 (sums to 50).
- Relative frequencies (percentages):
- Category A: 5011×100=22%
- Category B: 5023×100=46%
- Category C: 509×100=18%
- Category D: 507×100=14%
- What you can deduce from percentages alone: they succinctly summarize the distribution across the four categories and are ready to plot on various charts.
- Graph options:
- Bar chart: plot either frequencies or relative frequencies (percentages) for each category.
- Pie chart: sectors sized by the corresponding percentages above.
- Pareto chart: bars arranged in descending order of frequency (or percentage) to show which categories are most common.
- Practical note: pie bars in descending order are not necessary for pie charts, but Pareto charts emphasize the most frequent categories first.
- SPSS note: charts can be generated with color coding and clear labeling to aid interpretation.
Time series graphs
- What is a time series? A sequence of data points indexed in time order where the x-axis represents time.
- Typical axes:
- x-axis: time (e.g., days, weeks, months)
- y-axis: the measured quantity (e.g., stock price, distance, sales)
- Example discussed: distance covered during a 20-week exercise program (30-minute sessions per week)
- Week 1: 1.5 miles
- Week 2: 1.4 miles
- Week 3: 1.7 miles
- …
- Week 20: 2.7 miles
- Interpretation: distance tends to increase over time as practice improves fitness, though growth may level off as limits are reached.
- Statistical context: time series is a major area in statistics; specialized techniques exist for modeling and forecasting time-dependent data (course references to time series analysis classes).
Graphs discussed so far (in this course sequence)
- Histogram
- Stem-and-leaf plot
- Pie chart
- Bar chart
- Time series plot
- Pareto chart
- Ogive (cumulative frequency plot) via cumulative frequencies
- Back-to-back stem-and-leaf plot (comparison of two distributions)
- Split stem-and-leaf plot (two halves of leaves per stem)
Stem-and-leaf plots (two-digit stems, one-digit leaves)
- Purpose: a quick, textual alternative to a histogram that preserves the actual data values.
- Data handling example (dataset described: airline carry-on luggage weights for 40 passengers):
- Treat two-digit numbers as stems (tens) and leaves (units). For example, 30 becomes stem 3, leaf 0; 27 becomes stem 2, leaf 7; 12 becomes stem 1, leaf 2.
- When numbers are less than 10, you can pad with a leading zero (e.g., 03) to keep two-digit stems consistent.
- Steps for a stem-and-leaf plot:
1) Decide on stems (tens digits) and leaves (ones digits).
2) Place each data point in the corresponding stem row with its unit digit as a leaf.
3) For each stem, arrange leaves in ascending order. - Example interpretation (conceptual):
- Data may yield stems: 0, 1, 2, 3, 4, 5, etc. with leaves under each stem representing the trailing digits.
- An entry like 83 is represented as stem 8, leaf 3 (or, if you use a convention to keep two-digit stems, as 08 with leaf 3 depending on your scheme).
- Back-to-back stem-and-leaf plots: a method to compare two distributions side-by-side within the same stem framework.
- One dataset’s leaves appear on the left of the stem, the other dataset’s leaves on the right.
- Example setup (English vs History classes): stems range from a minimum to a maximum; leaves for English appear on the left, for History on the right.
- This makes patterns and differences between the two distributions visually immediate.
- Split stem-and-leaf plots: a refinement used when you want to summarize leaves more compactly.
- Each stem is split into two rows:
- Left row’s leaves cover 0–4
- Right row’s leaves cover 5–9
- Example: for a given stem, leaves 0,1,2,2 would appear on the first line, and leaves 5,7,8,9 on the second line.
- Practical notes:
- After constructing stems and leaves, you can read numbers directly from the plot.
- Back-to-back and split plots aid comparison and compactness, especially for larger datasets.
From stem-and-leaf to other graphs
- Frequency table: count how many observations fall into each stem/leaf category.
- Histogram: use the frequency data to plot a histogram with class boundaries.
- Example class boundaries mentioned: 68.5 to 79.5 (these are class boundaries for a histogram of, e.g., exam scores).
- Class width is the difference between the class boundaries, e.g.,
w=b<em>upper−b</em>lower
- Ogive (cumulative frequency plot): plot cumulative frequencies against upper class boundaries to show the buildup of data.
- If you have cumulative frequencies, you can plot them to assess growth and percentiles.
- Key dimensions to be able to read on these graphs:
- Class boundaries, class width, midpoints, frequencies, relative frequencies, cumulative frequencies.
- Practical emphasis: the instructor notes that this material emphasizes understanding the concepts and basic plotting rather than heavy algebra, and that you should be able to recall how to find class width and read stems/leaves for basic datasets.
Practical and ethical notes for data visualization
- Always label axes clearly and choose appropriate scales so that the visualization accurately reflects the data.
- Avoid misleading visuals by manipulating scales or using inappropriate chart types for the data.
- When reporting percentages, ensure they sum to 100% and consistently use either frequencies or percentages.
- For presentations and reports, use software (e.g., SPSS) for polished visuals, but be able to construct and interpret plots by hand as a diagnostic step.
Course flow and expectations (context)
- The instructor emphasizes that most of the current material is conceptual plotting and data summarization rather than difficult calculations.
- The course will revisit and build on these graphs across chapters; there will be SPSS assignments after introductory chapters.
- Expect questions about challenging aspects of these graphs in exams, but the math involved is typically manageable and emphasizes understanding data structure, not heavy computation.