Notes on Chapter 3: Describing Data using Distributions and Graphs
Data Visualization
Data Visualization: understanding data is using tables, charts, graphs, plots, and other visual tools to see what our data look like.
Graphing Qualitative & Quantitative Variables
Qualitative variables can be summarized by frequency and researchers can then use frequency tables and bar charts to show frequencies for categorized responses.
Qualitative data do not come with a pre-established ordering (the way numbers are ordered).
Quantitative variables are composed of numerical data.
A frequency distribution is a way to take a disorganized set of scores and place them in order from highest to lowest.
An outlier is an observation of data that does not fit the rest of the data.
Example: Age data pattern shown as 18, 18, 18, 19, 19, 20, 20, 21, 21.
Frequency Tables
Frequency Tables show the frequencies of the various response categories.
They also show the relative frequencies.
Relative frequency is the proportion of observations in each category, computed as \text{relative frequency} = \frac{f}{n} where f is the category frequency and n is the total number of observations.
Graphs (Overview of graph types)
A graph is a tool that helps you learn about the shape or distribution of a sample or a population.
They are used to summarize and organize quantitative data:
the dot plot \text{dot plot}
the bar graph \text{bar graph}
the histogram \text{histogram}
the stem-and-leaf plot \text{stem-and-leaf plot}
the frequency polygon \text{frequency polygon}
the pie chart \text{pie chart}
the box plot \text{box plot}
Bar Charts
Bar charts can be used to represent frequencies of different categories.
Bar charts are appropriate for qualitative data.
Typically, the Y-axis shows the number of observations in each category and the X-axis shows the categories.
Bar Charts: Common Mistakes to Avoid
Don’t get fancy! Three-dimensional figures are less clear than 2-D.
Don’t get creative; use plain bars.
Use simple 2-D design, not distracting.
The baseline to a value should be zero.
Bar Charts: Additional Rules
The baseline to a value should be zero.
Don’t use a line graph when the X-axis contains merely qualitative variables.
Graphing Quantitative Variables
There are many types of graphs that can be used to portray distributions of quantitative variables:
Histograms
Frequency polygons
Stem-and-leaf displays
Box plots
More bar charts
Line graphs
Scatter plots
Histograms
A histogram is a graphic version of a frequency distribution and helps to display the shape of a distribution.
The horizontal axis (x-axis) is labeled with what the data represents.
The vertical axis is labeled either frequency or relative frequency.
The histogram shows the distribution of the values, including the highest, middle, and lowest values.
Histograms: Class Intervals and Relative Frequencies
Histograms have class intervals; these are ranges of scores broken into intervals.
Histograms based on relative frequencies show the proportion of scores in each interval rather than the number of scores.
You can convert a histogram from frequencies to relative frequencies by:
(a) dividing each class frequency by the total number of observations, and then
(b) plotting the quotients on the Y-axis (labeled as proportion).
Frequency Polygons
Frequency polygons are a graphical device for understanding the shapes of distributions, are helpful for comparing sets of data, and displaying cumulative frequency distributions.
The first label on the X-axis can represent an interval; for example, the label 35 may represent the interval from 29.5 to 39.5. If the lowest test score is 46, that interval has a frequency of 0.
The point labeled 45 may represent the interval from 39.5 to 49.5, which has 3 scores.
There can be larger counts in the interval surrounding a higher value (e.g., 147 scores in the interval that surrounds 85).
In short: frequency polygons summarize distributions similarly to histograms and are useful for comparing distributions.
Frequency Polygons (repeat details)
Frequency polygons are a graphical device for understanding the shapes of distributions, are helpful for comparing sets of data, and displaying cumulative frequency distributions.
The first label on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the lowest test score is 46, this interval has a frequency of 0. The point labeled 45 represents the interval from 39.5 to 49.5. There are three scores in this interval. There are 147 scores in the interval that surrounds 85.
Stem and Leaf
The stem-and-leaf graph or stemplot is a good choice when the data sets are small.
To create the plot, divide each observation of data into a stem and a leaf.
The leaf consists of a final significant digit.
Box Plots
Box plots are useful for identifying outliers and for comparing distributions.
The box plot relies on the 25th, 50th, and 75th percentiles in the distribution of scores.
Therefore, the bottom of each box is the 25th percentile, the top is the 75th percentile, and the line in the middle is the 50th percentile.
The Shape of Distribution
The primary characteristic we are concerned about when assessing the shape of a distribution is whether the distribution is symmetrical or skewed.
Skew can be positive or negative (also known as right or left, respectively), based on which tail is longer.
Bar charts Beyond Frequency
Bar charts can be used to present other kinds of quantitative information, not just frequency counts.
Bar charts are particularly effective for showing change over time.
Bar charts are used to compare the means of different experimental conditions.
Avoid Pie Charts
It can be very difficult for humans to accurately perceive differences in the volume of shapes.
Pie charts are not recommended when you have a large number of categories.