PSYC 2021A Lecture 2 Notes – Frequency Distributions, Normal Distribution, and Visual Displays of Data
Frequency Distributions and Visual Displays of Data
Overview
Frequency distributions describe how often each value of a variable occurs or the proportion of observations at each value.
Used to understand relations between two or more variables by examining the distribution of individual data points.
Distinguish raw scores (untransformed data) from analyzed data.
Two main types discussed: ungrouped (frequency table for exact values) and grouped (intervals, bins).
Ungrouped Frequency Distribution (Frequency Table)
Purpose: visual depiction showing how often each value occurred (how many scores at each value).
Steps to create a frequency table (as outlined in the transcript, Table-style guidance):
Determine the highest score and the lowest score.
Create two columns; label the first with the variable name and label the second “Frequency.”
List the full range of values that encompasses all the scores in the data set, from highest to lowest. Include all values in the range, even those with zero frequency.
Count the number of scores at each value and write those numbers in the frequency column.
Example data and sources in the slides include volcano counts by country (Table 2-3), illustrating a wide range of values (e.g., some countries with 1 volcano, others with many).
Data sources cited: volcanoes by country (study data compiled by Oregon State University).
Volcanoes by Country (Table 2-3) – Key Points
Shows number of volcanoes in each country with at least one volcano.
Notable outliers (high counts): Indonesia, Japan, Russia, United States with counts in the range of roughly 40–81 volcanoes.
Example values mentioned: Antarctica (1), Argentina (1), Italy (6), Japan (40), Russia (55), United States (81).
Data note: 51 countries represented in the table (those with at least one volcano).
Data source: volcano.oregonstate.edu/volcanoesbycountry (2018).
Ungrouped Frequency Table (Table 2-4) – Frequency and Interpretation
Focused on the distribution of the number of volcanoes per country for the 51 countries with between 1 and 17 volcanoes.
Highlights the problem of long tables when outliers exist (Indonesia, Japan, Russia, United States have 40–81 volcanoes and would overwhelm a simple table).
Example counts and frequencies (paraphrased):
Number of Volcanoes: 17, Frequency: 1
16: 0
15: 0
14: 0
13: 1
12: 1
11: 0
10: 2
9: 2
8: 1
7: 3
6: 5
4: 1
5: 4
3: 4
2: 12
1: 14
Data source: volcanoes by country (2018).
Frequencies and Percentages (Table 2-5)
Expands Table 2-4 by adding percentages to accompany frequencies.
Example percentages (based on the slide's data):
Number of Volcanoes 17: Frequency 1 → Percentage rac{1}{51} imes 100 \,\% = 1.96\%
16: 0 → 0\%
13: 1 → 1.96\%
12: 1 → 1.96\%
10: 2 → 3.92\%
9: 2 → 3.92\%
7: 3 → 5.88\%
5: 4 → 7.84\%
4: 5 → 9.80\%
3: 4 → 7.84\%
2: 12 → 23.53\%
1: 14 → 27.45\%
Note: Percentages are based on the 51-country sample in Table 2-4.
Source: volcano data (2018).
Grouped Frequency Distribution
When many possible values exist, use a grouped (interval) frequency table to summarize data within ranges (bins).
Steps to create a grouped frequency table (as per the transcript):
Find the highest and lowest scores in the distribution.
Determine the full data range.
Decide the number of intervals (bins) and the interval size that best represents the data.
Determine the bottom value of the lowest interval.
List intervals from highest to lowest and count the number of scores in each interval.
Example: Table 2-6 presents a grouped frequency table for number of volcanoes by country around the world, summarizing nine intervals for 55 countries (excluding the four obvious outliers).
Intervals shown: 80-89, 70-79, 60-69, 50-59, 40-49, 30-39, 20-29, 10-19, 0-9 with corresponding frequencies.
Outliers (Indonesia, Japan, Russia, United States) are handled by excluding them from this grouped table to keep the table concise.
Histogram
A histogram is a graphical display of a grouped frequency distribution, with frequencies represented by vertical bars.
Appropriate for interval or ratio data.
Creates an overall impression by dividing possible values into bins and counting observations in each bin.
Steps to create a histogram (as described):
Determine the midpoint for every interval.
Draw the x-axis, labeling it with the variable of interest and the midpoints of intervals (include 0 if practical).
Draw the y-axis, label it “Frequency,” and include the full range of frequencies (include 0 if practical).
Draw a bar for each midpoint, centered on that midpoint, with height equal to the interval’s frequency.
Examples shown in the slides include a histogram for the raw number of volcanoes by country (un-grouped) and for the grouped frequency table (grouped histogram).
Shape of Distributions
Normal distribution (bell curve, Gaussian): a specific, symmetric, unimodal distribution.
Skewed distribution: not symmetric; one tail longer than the other.
Why shape matters:
Guides choice of descriptive statistics (e.g., median for skewed distributions).
Helps screen for outliers and influential observations.
Informs choice of inferential statistics.
Normal Distribution Characteristics (as per the slides):
Shape is symmetric about the mean, with mean = median = mode.
The area under the curve sums to 1: ext{Total area} = 1.
Described by two parameters: the mean
\mu and the standard deviation \sigma.Denser in the center, thinner in the tails.
About 68% of the distribution lies within one standard deviation: P(|X-\mu| \le \sigma) = 0.68.
About 95% lies within two standard deviations: P(|X-\mu| \le 2\sigma) = 0.95.
Skewness and Kurtosis
Skewness: degree of asymmetry of a distribution (positive skew to the right, negative skew to the left).
Positive skew (tail to the right) can involve Floor Effects (lower bound constraint).
Negative skew (tail to the left) can involve Ceiling Effects (upper bound constraint).
Kurtosis: measure of the 'pointiness' or peakedness of the distribution.
Visual and numerical consideration: skewness and kurtosis impact the choice of descriptive and inferential statistics.
Normal Distribution and the Mean/SD Relationship
Normal distributions are defined by two parameters: mean \mu and standard deviation \sigma.
If you keep the standard deviation the same and change the mean, the curves shift along the x-axis but retain the same shape.
If you keep the mean the same and change the standard deviation, the curves become wider (larger \sigma) or narrower (smaller \sigma) but share the same center.
Practice question examples from the slides illustrate this distinction:
Same mean, different standard deviation → changes the spread.
Different mean, same standard deviation → shifts the center without changing spread.
Chapter 3: Visual Displays of Data
Statistical Reasoning emphasizes careful interpretation: statistics can be misleading if data are misrepresented or misinterpreted.
Examples of questionable statistics in advertising and media (e.g., Colgate ad misrepresenting dentist preferences).
Common sources of misrepresentation include selective bias, flawed correlations, small samples, data fishing, faulty polling, and misleading visualizations.
The Power of Graphs and Misleading Visuals
Graphs can be misleading due to multiple factors:
Different time periods used on the x-axis or y-axis across lines or bars.
Mixing ordinal and scale variables on the same axis.
Starting the scale at a nonzero value to exaggerate trends.
Reversing the implied meaning of increases/decreases.
Example analyses discussed:
Crime rates from 1962 to 2006 displayed with total vs. subcategories.
Breast cancer statistics infographic with dramatic visuals and selective data presentation.
Summary from the slides: the combination of these factors can distort interpretation and should be scrutinized.
Statistical Reasoning Questions for Interpreting Statistics
Key questions to ask when evaluating a study:
Are the outcomes clearly defined and measured without substantial error?
Have adjustments been made for unequal sample sizes or other biases?
Is there evidence of measurement precision (sampling errors, confidence intervals)?
Was the target population clearly defined for inferences?
Was the sample representative? Could survey wording or response bias influence results?
Are there order effects, funding influences, or incomplete information that biases interpretation?
Are graphical presentations fair, and do they imply causality from correlational data?
Does statistical significance imply practical or clinical significance?
Important reminder: a statistical test indicates likelihood under a null hypothesis, not practical importance.
Major Assignment 1: Lying with Statistics
Due date and submission method noted: Major Assignment #1 Lying with Statistics.
Task: Find and critically examine three different examples where a claim has been made with a statistical basis in advertising and news articles.
Purpose: Develop critical analysis of how statistics are used to persuade or mislead.
Common Types of Graphs
Scatterplots
Depict relationships between two scale variables; show all data points; can reveal linear or nonlinear relationships.
Line graphs
Used to illustrate relationships between two scale variables over time; the line of best fit can be used for predictions.
Box plots (box-and-whiskers plots)
Use for interval or ratio data; summarize median, quartiles, and potential outliers.
Bar graphs
Suitable when the independent variable is nominal or ordinal and the dependent variable is scale; bar height indicates the average value for each category.
Pictorial graphs
Use pictures to represent data, which can be visually appealing but potentially misleading if not scaled properly.
Choosing a Graph
Guidelines by data type:
One scale variable with frequencies: histogram.
One scale independent variable and one scale dependent variable: scatterplot or line graph.
One nominal/ordinal independent variable and one scale dependent variable: bar graph or Pareto chart (bars ordered from largest to smallest).
Two or more nominal/ordinal independent variables and one interval dependent variable: bar graph.
How to Read a Graph: General Guidelines
Include a clear, specific title.
Label axes with variable names and units.
Ensure terms on the graph match those in the text.
Include units of measurement.
Use 0 or cut marks where appropriate.
Use colors simply and avoid chartjunk (unnecessary decorations).
Visual Displays and Tools
Jamovi and other software tools referenced for statistics education and learning (e.g., jamovi.org).
Quick Reference: Notable Tables and Figures Mentioned
Table 2-3: Volcanoes around the world (counts by country; outliers highlighted).
Table 2-4: Frequency table for volcanoes by country (1–17 volcanoes; 51 countries).
Table 2-5: Frequencies and percentages for volcanoes by country (with calculated percentages).
Table 2-6: Grouped frequency table for volcanoes by country (9 intervals).
Histogram figures for volcano data (un grouped and grouped).
Figure examples include normal vs. skewed distributions, and graphical data presentations (e.g., breast cancer infographic).
Terminology Recap
Raw Scores: Data that have not yet been transformed or analyzed.
Frequency Distribution: Pattern of a data set showing counts or proportions for each possible value.
Interval/Bin: A range of values used in grouped distributions and histograms.
Normal Distribution: Symmetric, bell-shaped distribution defined by mean and standard deviation.
Skewness: Asymmetry of the distribution, positive or negative.
Kurtosis: Pointiness or peakedness of a distribution.
Outliers: Values far from the center that can influence results.
Chartjunk: Unnecessary decorations in graphs that obscure data.
Mathematical Highlights (key formulas and constants)
Normal distribution parameters: mean \mu and standard deviation \sigma.
Area under normal curve: \text{Total area} = 1.
Proportions for normal distribution:
P(|X-\mu|\le \sigma) = 0.68
P(|X-\mu|\le 2\sigma) = 0.95
Relationship between mean and standard deviation and the shape/width of the distribution: keeps track of center and spread of the data.
Practical Takeaways
Use the appropriate type of graph for the data at hand to accurately convey information.
Be cautious of misleading graphs and statistics; always examine scale, units, and sample context.
For skewed data, prefer median as a robust measure of center and use nonparametric methods if needed.
When describing distributions, clearly specify whether you are discussing raw (ungrouped) data or grouped data, and whether the data come from outliers or a representative sample.
Links and Resources Mentioned
iClicker join link and test prompts (for in-class participation) – not central to statistical content but part of the lecture context.
Jamovi educational resources and historical references cited in the slides.
Quick Study Prompts (conceptual questions you might encounter)
How does increasing the standard deviation while keeping the mean fixed affect the normal curve?
What kind of distribution would you expect if the data exhibit a floor effect?
When is it more appropriate to use the median rather than the mean as a measure of central tendency?
What potential issues should you evaluate when interpreting a line graph showing trends over time?
Final Note
The material emphasizes critical thinking about data presentation, the importance of proper graph construction, and the cautious interpretation of statistical claims in real-world contexts.