Module 3 – Displaying Data
Displaying Data
Module 3 focuses on different ways to display data, as organizing and interpreting data is a key part of statistics (defined in Module 2).
Introduction to Data Display
SPSS is a common statistical program that will be mentioned throughout the semester.
Data files contain information, but the data aren't organized in an easily understandable way.
Example: An environmental education study examining the association between program participation and CATs scores, using reading scores (RD2) as the dependent variable and number of visits (NumVisits) as the independent variable.
Organizing data in a picture is the best way to convey information simply.
Charts and Data Organization
The goal is to convey information in the simplest way possible using pictures.
Frequency tables display the count of scores within each variable, especially for categorical data, such as comparing Stats grades by sex.
Grades by Sex
Grade | Women | Men |
|---|---|---|
A | 15 | 5 |
B | 30 | 10 |
C | 15 | 5 |
D | 10 | 3 |
F | 5 | 2 |
Types of Charts
When at least one variable is continuous, common chart types include:
Bar charts: categorical data and frequency, or categorical and continuous data.
Line charts: two continuous variables.
Scatter plots: two continuous variables.
Histograms: one continuous variable and its frequency.
Chart Design
Good charts organize data in an easy-to-understand way.
Basic design elements:
X-axis: Usually represents the independent variable (IV).
Y-axis: Represents the dependent variable (DV).
Labeled axes.
Figure caption: Provides additional information.
Bar Charts
Bar charts are commonly used to show the proportion of different values in a variable with categorical data.
Proportion as a percentage. For example, calculating the proportion and percentage of women's grades to create a bar chart.
Example Calculation
Total women: 75
A's: 15/75 = 0.2 = 20%
B's: 30/75 = 0.4 = 40%
All percentages should add up to 100% (or close, allowing for rounding).
Proportion gives the same information as percentage, but is not based on 100 people; proportions should add up to 1.
Creating Bar Charts
Frequency tables can be used to make bar charts in Excel.
Insert a column chart (called a bar chart) from the charts section, despite it being called a column chart.
Edit the chart: add axis titles (Statistics Grade on the X axis, Proportion of Women on the Y axis), remove legends and titles.
Editing Charts
Edit charts to improve clarity.
Labeled axes are necessary for understandability.
Pie charts are used, but scientists find them more confusing than bar charts and they don't provide additional information.
Comparisons with Bar Charts
Bar charts can be used to show comparisons between one categorical and one continuous variable.
Example: Comparing the average science scores of male and female students.
Be wary of misleading charts (especially those that are default settings):
Always check the scale on the y-axis.
Spreadsheet software can magnify differences.
Line Charts
Line charts are often used with two continuous variables, especially for time plots showing trends over time.
Spreadsheet programs: Insert -> Line chart.
Some studies show more than two variables. In the line chart below, you see the number of employees (continuous DV) based on the year (continuous IV) and employer (categorical IV).
Scatter Plots
Scatterplots are used to show relationships between two continuous variables.
Each dot on a scatter plot represents a single score, whereas each dot on a line chart represents a mean.
Scatter plots help visualize relationships or trends between variables.
Histograms
Histograms display the frequency of a single continuous variable.
Creating one is like creating a bar chart that shows frequency, but with a continuous variable instead of a categorical variable.
Creating Histograms
One method starts with creating a frequency table.
The "insane method" involves tallying up the number of students who got each individual score which can be tedious with a large range of scores.
Normal Approach: With many possible values, use a grouped frequency table (dividing scores into groups of 5, 10, 20, etc.). Larger groups are easier, but provide less detail.
Using Excel: Select the data, insert a chart, choose 'Other Charts' and then 'Histogram'.
Add labels to the histogram.
Bar Charts vs Histograms
Bar charts should have spaces between the bars.
Histograms should NOT have spaces between the bars.
Bar charts show:
One variable with categorical data and the frequency of that variable, or
One variable with categorical data and one variable with continuous data.
Histograms show:
One continuous variable and frequency of that variable.
Histogram Shapes
Histograms can reveal information about data, such as the shape of the distribution:
Symmetric
Asymmetric (skewed): Positively skewed or Negatively skewed
Skewness
Positively skewed: tail on the "positive" side. Most scores fall to the left. A small number of students do well, but the majority are doing poorly.
Negatively skewed: tail on the "negative" side. Most scores fall in high range. A few students will fail, but the majority will have A’s.
Symmetric Distributions
These distributions are called symmetric because they have the same shape on both sides of the distribution.
Flat/Uniform: Each group of scores has the same frequency.
Hill-Shaped: More scores fall near the middle of the distribution than in the extremes. (e.g. If a C is average then you would expect most people to make a C, a few people to make Ds and Bs, and even fewer people to make Fs and As.)
Normal Distribution: Less scores falling in the extreme ranges. In this class it’s rare to get an A or an F.
Additional Insights from Histograms
Number of "peaks" in the data: unimodal, bimodal, or multimodal.
Outliers: Extreme points that stand out from the rest of the distribution.