Psychological Statistics - Data Distributions

Module 2: Data Distributions

This module provides a comprehensive overview of data distributions, emphasizing both graphical and tabular methods to represent numeric and categorical variables. It delves into the nuances of measuring psychological constructs, interpreting various distribution shapes, and applying statistical analyses using Jamovi and RStudio.

Attendance

The attendance code for the Quiz on January 14th is 5.

Key Variables

Depressive Symptoms: Quantified using the Center for Epidemiologic Studies Depression Scale (CES-D), a widely recognized instrument for assessing the presence and severity of depressive symptoms.
Dispositional Optimism: Evaluated through the Life Orientation Test Revised (LOT-R), which measures an individual's general tendency to expect positive outcomes in life.

Measuring Depressive Symptoms (CES-D)

The CES-D is composed of 20 items, each designed to capture the frequency of depressive symptoms experienced by individuals.
The items address a range of symptoms, including sleep disturbances, changes in appetite, feelings of sadness, and difficulties in concentration.
Each item is rated on a 4-point scale:
- 0 = Rarely or none of the time (less than 1 day)
- 1 = Some or a little of the time (1-2 days)
- 2 = Occasionally or a moderate amount of the time (3-4 days)
- 3 = Most or all of the time (5-7 days)
Higher total scores on the CES-D indicate a greater presence and severity of depressive symptoms.
A cutoff score of ≥ 16 is commonly used to indicate clinically elevated depressive symptoms, suggesting the need for further evaluation.
The CES-D demonstrates strong psychometric properties, including high reliability (α = .89) in cancer samples, indicating its consistency and accuracy in measuring depressive symptoms within this population.

CES-D Inventory Example

Participants are instructed to rate their experiences over the past week for each item on the CES-D.
Example items:
- I was bothered by things that usually don’t bother me.
- I did not feel like eating; my appetite was poor.
- I felt that I could not shake off the blues.
- I felt that I was just as good as other people.
Each item is rated on a scale from 0 to 3, reflecting the frequency with which the symptom was experienced.

Sum Scores

Researchers often calculate sum scores for psychological constructs by aggregating responses to individual items on questionnaires.
These sum scores, also referred to as scale or composite scores, offer several advantages:
- Improved reliability and validity compared to individual items.
- Measurement of a broader range of attributes related to the construct.
- Approximation of a numeric (interval or ratio) scale, allowing for more sophisticated statistical analyses.

Examples of Sum Scores

Subclinical Sum Score: An example is provided with a sum score of 9, indicating depressive symptoms below the clinical cutoff.
Clinically Elevated Sum Score: An example is given with a sum score of 29, which exceeds the cutoff of 16 and suggests clinically elevated depressive symptoms.
The highest possible score on the CES-D is 60, calculated as $20 \, items \, * \, 3 \, (max \, score \, per \, item) \, = \, 60$ .

Measuring Dispositional Optimism (LOT-R)

Dispositional optimism and pessimism are measured using the Life Orientation Test—Revised (LOT-R), a widely used instrument in psychological research.
The LOT-R includes three optimism items, three pessimism items, and four filler items designed to mask the purpose of the test.
Items are rated on a 5-point Likert scale, ranging from 'I agree a lot' to 'I disagree a lot.'
Summed scores are computed separately for optimism (α = .74) and pessimism (α = .80), providing distinct measures of these constructs.
Each scale has a potential range of 0–12, with higher scores indicating higher levels of the respective attribute.

LOT-R Optimism Questionnaire Example

Participants indicate their agreement with each statement on a scale from 0 to 4.
Example items:
- In uncertain times, I usually expect the best.
- I’m always optimistic about my future.
- Overall, I expect more good things to happen to me than bad.

Comparing CES-D and LOT-R

The CES-D measures depression using a sum score ranging from 0 to 60, providing a relatively wide range of possible scores. The LOT-R optimism sum score ranges from 0 to 12.
The CES-D is considered to be closer to a true numeric (interval or ratio) scale due to its larger range, which allows for finer distinctions between individuals' levels of depressive symptoms.

Distributions

A distribution describes how score values are concentrated or spread out within a dataset.
Distributions can be summarized using various methods, including graphs, tables, and mathematical functions, each offering unique insights into the data.
The choice of summary method depends on the nature of the variable, whether it is numeric (interval or ratio) or categorical (nominal or ordinal).

Numeric (Continuous) Variables

Interval and ratio scales are types of numeric (continuous) variables that approximate a number line, allowing for meaningful calculations of differences and ratios (e.g., response time, age, GRE).
On these scales, any two adjacent score values reflect the same amount of the variable, ensuring equal intervals between values.
Many behavioral, emotional, and physiological variables in psychology are treated as approximating interval-level data, although they may not perfectly meet the criteria of true interval scales.

Distribution Elements

Horizontal Axis (Score Range): Represents the range of possible scores, following a number line from low to high.
Vertical Axis (Frequency): Indicates the frequency, count, or density of each score, reflecting how often each score occurs in the dataset.
Distribution Center: Identifies the value around which scores are most concentrated, often represented by measures such as the mean or median.
Variability or Spread: Describes the extent to which scores differ from each other, with high variability indicating greater score differences and low spread indicating more similar scores.
Distribution Tails: Represent the extreme ends of the distribution, containing unusual or infrequent score values, often referred to as outliers.

Distribution Shape

Asymmetric distributions, which lack symmetry around their center, can be classified as positively or negatively skewed.
Positively Skewed: Characterized by a long tail extending toward the positive side of the number line, indicating a concentration of scores at the lower end of the scale.
Negatively Skewed: Characterized by a long tail extending toward the negative side of the number line, indicating a concentration of scores at the higher end of the scale.

Skewness Examples

If most individuals score below the clinical cutoff for depressive symptoms (CES-D ≥ 16), the distribution of CES-D scores would be positively skewed, with a tail extending toward higher scores.
Conversely, if most individuals report feeling optimistic, the distribution of optimism scores would be negatively skewed, with a tail extending toward lower scores.

Graphical Displays for Numeric Variables

Histograms: Bar plots used to represent the distribution of numeric variables, with no gaps between bars to indicate the continuous nature of the data.
Kernel Density Plots: Smoothing algorithm that connects histogram bars with a continuous curve, providing a visual representation of the underlying probability density function.
Frequency = density = count = vertical elevation in these graphical displays.

Example: Uveal Melanoma and Depression Study

A study investigated the predictive role of visual impairment, moderated by optimism/pessimism, on depressive symptoms in adults diagnosed with uveal melanoma.
Histograms and kernel density plots were used to examine the distribution of depressive symptoms and optimism scores within the study population.

Perceived Discrimination and Internalizing Behavior Study

A study explored a spillover model of internalizing symptom development in relation to perceived discrimination among Latinx adolescents.
Perceived discrimination encompasses negative attitudes or unfair treatment stemming from specific characteristics such as race or ethnicity.
Internalizing behaviors are inward-directed behaviors, including anxious and depressive symptoms, that can result from experiences of discrimination.

Measuring Perceived Discrimination

The 10-item discrimination subscale of the Social, Attitudinal, Familial, and Environmental Acculturative Stress-Child Scale (SAFE-C) was used to measure perceived discrimination.
Participants rated items on a scale ranging from 1 (Not at all true) to 4 (Very true), indicating the extent to which they had experienced discrimination.
Higher scores on the SAFE-C discrimination subscale indicate greater levels of perceived discrimination.

Distribution of Perceived Discrimination

The distribution shape of perceived discrimination scores was analyzed to understand the patterns of discrimination experiences among Latinx adolescents.

Categorical Variables

Nominal Variables: Categorize group features into mutually exclusive, unordered categories (e.g., gender identity, racial/ethnic identity), where the categories have no inherent rank ordering.
Ordinal Variables: Group features into ordered categories that can be ranked (e.g., Likert scales, educational attainment groupings), where the categories have a meaningful order or hierarchy.

Bar Plots and Frequency Distributions

A bar plot is a graphical display for categorical variables, illustrating the number or percent of responses within each category.
A frequency distribution provides the same information in a tabular format, presenting the counts and percentages for each category.

Skin Color Satisfaction and Binge Eating Study

A study examined the relationship between skin color satisfaction and binge eating behaviors in Black girls.
Skin color satisfaction was assessed using a 4-point Likert scale, capturing the participants' feelings about their skin color.

Measuring Skin Color Satisfaction

Participants rated their happiness with their skin color on a 4-point scale: 1 = Very happy, 2 = Happy, 3 = Unhappy, 4 = Very unhappy.
Scores were reverse coded so that higher scores reflect more satisfaction using the formula: $New \,score = 5 - Old \,score$ .
- Example: Old score=1 (Very happy becomes New score=4).

Bar Graph vs. Frequency Distribution

Bar graphs visually depict the distribution of categorical data, offering a quick and intuitive understanding of category frequencies.
Frequency distributions present the same information in a tabular format, including counts, percentages, and cumulative percentages, providing detailed insights into the data.

Racial Disparities in PTSD Study

A study investigated racial and ethnic differences in posttraumatic stress disorder (PTSD) among postpartum women.
The racial/ethnic identity variable comprised Black, Latina, and non-Hispanic White women, allowing for comparisons across these groups.

Nominal Responses and Distribution Shape

Distribution shape is not applicable to nominal variables because the categories lack inherent rank ordering, making concepts like skewness irrelevant.

Study Questions

Examples of study questions include:
- Sketching histograms for health ratings and patient satisfaction data.
- Computing missing values from frequency distribution tables using available data.
- Calculating cumulative percentages to understand the distribution of ordinal variables.
- Identifying distribution shapes, such as skewness and modality, in various datasets.

Jamovi and RStudio Analyses

The module provides instructions on how to conduct descriptive analyses and generate visualizations using both Jamovi and RStudio, two popular statistical software packages.
Demonstrations cover selecting variables, generating frequency tables, histograms, and kernel density plots within each software environment.
RStudio code examples illustrate how to create bar plots and frequency distributions programmatically.

Key RStudio Code Snippets

Loading required R packages, such as ggplot2 for creating visualizations and summarytools for generating descriptive statistics.
Reading data from a remote file using functions like read.csv() or read_excel().
Summarizing variables using functions like describe