book 4

Describing Single Variables

Learning Objectives

  1. Use frequency tables and histograms to display and interpret the distribution of a variable.

  2. Compute and interpret the mean, median, and mode of a distribution and identify situations in which the mean, median, or mode is the most appropriate measure of central tendency.

  3. Compute and interpret the range and standard deviation of a distribution.

  4. Compute and interpret percentile ranks and z scores.

Descriptive Statistics Overview

  • Descriptive statistics refers to a set of techniques for summarizing and displaying data.

  • Data is typically quantitative and consists of scores on one or more variables for each of several study participants.

  • Importance of describing each variable individually even when the primary research questions involve statistical relationships between variables.

The Distribution of a Variable

  • Every variable has a distribution, which indicates how scores are distributed across the levels of that variable.

  • Example:

    • Distribution of "number of siblings" from a sample of 100 university students:

    • 10 have no siblings

    • 30 have one sibling

    • 40 have two siblings

    • Distribution of the variable "sex":

    • 44 score as "male"

    • 56 score as "female".

Frequency Tables

  • A frequency table displays the distribution of a variable.

    • Table 12.1: Frequency table for a hypothetical distribution of scores on the Rosenberg Self-Esteem Scale for a sample of 40 college students.

    • First column: possible scores on the Rosenberg scale.

    • Second column: frequency of each score.

    • Example:

      • 3 students scored 24

      • 5 students scored 23

      • 10 students scored 22

      • The range of scores is from 15 to 24.

      • The most common score is 22 while the least common is 17.

Points to Note About Frequency Tables
  • Levels in the first column typically listed from highest to lowest and do not extend beyond the highest and lowest scores in the data.

  • For a wide range of values, a grouped frequency table is recommended where ranges are of equal width.

    • Table 12.2: Grouped frequency table showing a distribution of simple reaction times for 20 participants.

  • Frequency tables can also be used for categorical variables where order of category labels varies.

Histograms

  • A histogram is a graphical display of the distribution presenting the same information as a frequency table.

  • X-axis represents the variable; y-axis represents frequency.

  • In quantitative variables, there is usually no gap between bars.

  • Figure 12.1: Histogram of self-esteem scores representing the distribution from Table 12.1.

Distribution Shapes

  • The shape of a distribution in a histogram can indicate—

    • Unimodal (one distinct peak) or bimodal (two distinct peaks).

    • Example: Figure 12.2 shows a hypothetical bimodal distribution of scores on the Beck Depression Inventory.

  • Distributions can be characterized as symmetrical or skewed:

    • Symmetrical: Left and right halves are mirror images.

    • Negatively Skewed: Peak shifted towards the upper range and a long negative tail.

    • Positively Skewed: Peak towards the lower range and a long positive tail.

Outliers

  • An outlier is an extreme score much higher or lower than the rest of the scores in the distribution.

    • May represent true extremes or errors/malfunctions.

    • Example: A clinically depressed individual could be an outlier in a happy sample.

Measures of Central Tendency

  • Central tendency refers to the point around which scores tend to cluster.

    • Also known as average.

  • Three common measures of central tendency:

    • Mean (M):

    • Formula: M = \frac{\Sigma X}{N}

    • \Sigma indicates summation across scores, and N is the number of scores.

    • Generally provides a good indication of central tendency and has statistical properties for inferential statistics.

    • Median:

    • Defined as the middle score—half the scores are less and half are greater.

    • To find median: Organize scores from lowest to highest and find the middle value.

    • If odd number of scores, the median is the middle score; for even, it is the average of the two middle scores.

    • Mode:

    • The most frequently occurring score.

    • Example: In the self-esteem distribution (Table 12.1), the mode is 22.

  • In unimodal and symmetrical distributions, mean, median, and mode are close, while in bimodal or asymmetrical distributions, they can diverge.

  • Visualization example: Reaction times of 200, 250, 280, and 250 ms mean is 245 ms, but adding an outlier (e.g., 5,000 ms) makes the mean unreliable.

Measures of Variability

  • Variability indicates the extent scores vary around their central tendency.

  • The range is one measure of variability:

    • Computed as the difference between the highest and lowest scores.

    • Example: Range in self-esteem scores (Table 12.1): 24 - 15 = 9.

  • The Standard Deviation (SD) is the most common measure:

    • Indicates average distance scores differ from the mean.

    • Formula: SD = \sqrt{\frac{\Sigma (X - M)^{2}}{N}}.

    • Involves step-by-step calculation of differences from the mean, squaring differences, averaging, and square rooting.

    • Example calculations demonstrated in Table 12.3:

    • From a set of 8 scores with a mean of 5, various individual differences and squares documented.

  • Often, calculators and software divide by N-1 when computing SD to correct for bias in sample estimates of the population.

Correlations Between Quantitative Variables

  • Many statistical relationships arise between quantitative variables.

  • Example: Study by Carlson and Conard found a relationship between the alphabetical order of last names and response times to consumer appeals.

  • Relationships are often illustrated through line graphs or scatterplots.

  • In Figure 12.6, response times related to the quartile of last names showed a negative relationship.

  • Figure 12.7 showed scores on the Rosenberg Self-Esteem Scale indicating a positive relationship.

  • Relationships can be linear (best fit by a straight line) or nonlinear (fit better by a curve).

  • Pearson's r: Measures the strength of correlation; values range from -1.00 (strong negative) to +1.00 (strong positive) with specific interpretative guidelines:

    • Values near ±0.10 are small

    • Values near ±0.30 are medium

    • Values near ±0.50 are large.

  • Example computation of Pearson's r involves transforming scores into z-scores, then estimating correlations and averages.

Key Takeaways

  • Group differences are described via means, standard deviations, and Cohen's d (effect size measure).

  • Correlations are described using Pearson's r and visualized via graphs.

Exercises

  1. Compute means and standard deviations for Rosenberg Self-Esteem scores from two sample groups (Japanese and American university students) and demonstrate using bar graphs and Cohen’s d.

  2. Create a scatterplot for extraversion scores and Facebook friends for university students, calculating Pearson's r and describing the results in detail.