book 4
Describing Single Variables
Learning Objectives
Use frequency tables and histograms to display and interpret the distribution of a variable.
Compute and interpret the mean, median, and mode of a distribution and identify situations in which the mean, median, or mode is the most appropriate measure of central tendency.
Compute and interpret the range and standard deviation of a distribution.
Compute and interpret percentile ranks and z scores.
Descriptive Statistics Overview
Descriptive statistics refers to a set of techniques for summarizing and displaying data.
Data is typically quantitative and consists of scores on one or more variables for each of several study participants.
Importance of describing each variable individually even when the primary research questions involve statistical relationships between variables.
The Distribution of a Variable
Every variable has a distribution, which indicates how scores are distributed across the levels of that variable.
Example:
Distribution of "number of siblings" from a sample of 100 university students:
10 have no siblings
30 have one sibling
40 have two siblings
Distribution of the variable "sex":
44 score as "male"
56 score as "female".
Frequency Tables
A frequency table displays the distribution of a variable.
Table 12.1: Frequency table for a hypothetical distribution of scores on the Rosenberg Self-Esteem Scale for a sample of 40 college students.
First column: possible scores on the Rosenberg scale.
Second column: frequency of each score.
Example:
3 students scored 24
5 students scored 23
10 students scored 22
The range of scores is from 15 to 24.
The most common score is 22 while the least common is 17.
Points to Note About Frequency Tables
Levels in the first column typically listed from highest to lowest and do not extend beyond the highest and lowest scores in the data.
For a wide range of values, a grouped frequency table is recommended where ranges are of equal width.
Table 12.2: Grouped frequency table showing a distribution of simple reaction times for 20 participants.
Frequency tables can also be used for categorical variables where order of category labels varies.
Histograms
A histogram is a graphical display of the distribution presenting the same information as a frequency table.
X-axis represents the variable; y-axis represents frequency.
In quantitative variables, there is usually no gap between bars.
Figure 12.1: Histogram of self-esteem scores representing the distribution from Table 12.1.
Distribution Shapes
The shape of a distribution in a histogram can indicate—
Unimodal (one distinct peak) or bimodal (two distinct peaks).
Example: Figure 12.2 shows a hypothetical bimodal distribution of scores on the Beck Depression Inventory.
Distributions can be characterized as symmetrical or skewed:
Symmetrical: Left and right halves are mirror images.
Negatively Skewed: Peak shifted towards the upper range and a long negative tail.
Positively Skewed: Peak towards the lower range and a long positive tail.
Outliers
An outlier is an extreme score much higher or lower than the rest of the scores in the distribution.
May represent true extremes or errors/malfunctions.
Example: A clinically depressed individual could be an outlier in a happy sample.
Measures of Central Tendency
Central tendency refers to the point around which scores tend to cluster.
Also known as average.
Three common measures of central tendency:
Mean (M):
Formula: M = \frac{\Sigma X}{N}
\Sigma indicates summation across scores, and N is the number of scores.
Generally provides a good indication of central tendency and has statistical properties for inferential statistics.
Median:
Defined as the middle score—half the scores are less and half are greater.
To find median: Organize scores from lowest to highest and find the middle value.
If odd number of scores, the median is the middle score; for even, it is the average of the two middle scores.
Mode:
The most frequently occurring score.
Example: In the self-esteem distribution (Table 12.1), the mode is 22.
In unimodal and symmetrical distributions, mean, median, and mode are close, while in bimodal or asymmetrical distributions, they can diverge.
Visualization example: Reaction times of 200, 250, 280, and 250 ms mean is 245 ms, but adding an outlier (e.g., 5,000 ms) makes the mean unreliable.
Measures of Variability
Variability indicates the extent scores vary around their central tendency.
The range is one measure of variability:
Computed as the difference between the highest and lowest scores.
Example: Range in self-esteem scores (Table 12.1): 24 - 15 = 9.
The Standard Deviation (SD) is the most common measure:
Indicates average distance scores differ from the mean.
Formula: SD = \sqrt{\frac{\Sigma (X - M)^{2}}{N}}.
Involves step-by-step calculation of differences from the mean, squaring differences, averaging, and square rooting.
Example calculations demonstrated in Table 12.3:
From a set of 8 scores with a mean of 5, various individual differences and squares documented.
Often, calculators and software divide by N-1 when computing SD to correct for bias in sample estimates of the population.
Correlations Between Quantitative Variables
Many statistical relationships arise between quantitative variables.
Example: Study by Carlson and Conard found a relationship between the alphabetical order of last names and response times to consumer appeals.
Relationships are often illustrated through line graphs or scatterplots.
In Figure 12.6, response times related to the quartile of last names showed a negative relationship.
Figure 12.7 showed scores on the Rosenberg Self-Esteem Scale indicating a positive relationship.
Relationships can be linear (best fit by a straight line) or nonlinear (fit better by a curve).
Pearson's r: Measures the strength of correlation; values range from -1.00 (strong negative) to +1.00 (strong positive) with specific interpretative guidelines:
Values near ±0.10 are small
Values near ±0.30 are medium
Values near ±0.50 are large.
Example computation of Pearson's r involves transforming scores into z-scores, then estimating correlations and averages.
Key Takeaways
Group differences are described via means, standard deviations, and Cohen's d (effect size measure).
Correlations are described using Pearson's r and visualized via graphs.
Exercises
Compute means and standard deviations for Rosenberg Self-Esteem scores from two sample groups (Japanese and American university students) and demonstrate using bar graphs and Cohen’s d.
Create a scatterplot for extraversion scores and Facebook friends for university students, calculating Pearson's r and describing the results in detail.