This module provides a comprehensive overview of data distributions, emphasizing both graphical and tabular methods to represent numeric and categorical variables. It delves into the nuances of measuring psychological constructs, interpreting various distribution shapes, and applying statistical analyses using Jamovi and RStudio.
The attendance code for the Quiz on January 14th is 5.
Depressive Symptoms: Quantified using the Center for Epidemiologic Studies Depression Scale (CES-D), a widely recognized instrument for assessing the presence and severity of depressive symptoms.
Dispositional Optimism: Evaluated through the Life Orientation Test Revised (LOT-R), which measures an individual's general tendency to expect positive outcomes in life.
The CES-D is composed of 20 items, each designed to capture the frequency of depressive symptoms experienced by individuals.
The items address a range of symptoms, including sleep disturbances, changes in appetite, feelings of sadness, and difficulties in concentration.
Each item is rated on a 4-point scale:
0 = Rarely or none of the time (less than 1 day)
1 = Some or a little of the time (1-2 days)
2 = Occasionally or a moderate amount of the time (3-4 days)
3 = Most or all of the time (5-7 days)
Higher total scores on the CES-D indicate a greater presence and severity of depressive symptoms.
A cutoff score of ≥ 16 is commonly used to indicate clinically elevated depressive symptoms, suggesting the need for further evaluation.
The CES-D demonstrates strong psychometric properties, including high reliability (α = .89) in cancer samples, indicating its consistency and accuracy in measuring depressive symptoms within this population.
Participants are instructed to rate their experiences over the past week for each item on the CES-D.
Example items:
I was bothered by things that usually don’t bother me.
I did not feel like eating; my appetite was poor.
I felt that I could not shake off the blues.
I felt that I was just as good as other people.
Each item is rated on a scale from 0 to 3, reflecting the frequency with which the symptom was experienced.
Researchers often calculate sum scores for psychological constructs by aggregating responses to individual items on questionnaires.
These sum scores, also referred to as scale or composite scores, offer several advantages:
Improved reliability and validity compared to individual items.
Measurement of a broader range of attributes related to the construct.
Approximation of a numeric (interval or ratio) scale, allowing for more sophisticated statistical analyses.
Subclinical Sum Score: An example is provided with a sum score of 9, indicating depressive symptoms below the clinical cutoff.
Clinically Elevated Sum Score: An example is given with a sum score of 29, which exceeds the cutoff of 16 and suggests clinically elevated depressive symptoms.
The highest possible score on the CES-D is 60, calculated as 20 \, items \, * \, 3 \, (max \, score \, per \, item) \, = \, 60.
Dispositional optimism and pessimism are measured using the Life Orientation Test—Revised (LOT-R), a widely used instrument in psychological research.
The LOT-R includes three optimism items, three pessimism items, and four filler items designed to mask the purpose of the test.
Items are rated on a 5-point Likert scale, ranging from 'I agree a lot' to 'I disagree a lot.'
Summed scores are computed separately for optimism (α = .74) and pessimism (α = .80), providing distinct measures of these constructs.
Each scale has a potential range of 0–12, with higher scores indicating higher levels of the respective attribute.
Participants indicate their agreement with each statement on a scale from 0 to 4.
Example items:
In uncertain times, I usually expect the best.
I’m always optimistic about my future.
Overall, I expect more good things to happen to me than bad.
The CES-D measures depression using a sum score ranging from 0 to 60, providing a relatively wide range of possible scores. The LOT-R optimism sum score ranges from 0 to 12.
The CES-D is considered to be closer to a true numeric (interval or ratio) scale due to its larger range, which allows for finer distinctions between individuals' levels of depressive symptoms.
A distribution describes how score values are concentrated or spread out within a dataset.
Distributions can be summarized using various methods, including graphs, tables, and mathematical functions, each offering unique insights into the data.
The choice of summary method depends on the nature of the variable, whether it is numeric (interval or ratio) or categorical (nominal or ordinal).
Interval and ratio scales are types of numeric (continuous) variables that approximate a number line, allowing for meaningful calculations of differences and ratios (e.g., response time, age, GRE).
On these scales, any two adjacent score values reflect the same amount of the variable, ensuring equal intervals between values.
Many behavioral, emotional, and physiological variables in psychology are treated as approximating interval-level data, although they may not perfectly meet the criteria of true interval scales.
Horizontal Axis (Score Range): Represents the range of possible scores, following a number line from low to high.
Vertical Axis (Frequency): Indicates the frequency, count, or density of each score, reflecting how often each score occurs in the dataset.
Distribution Center: Identifies the value around which scores are most concentrated, often represented by measures such as the mean or median.
Variability or Spread: Describes the extent to which scores differ from each other, with high variability indicating greater score differences and low spread indicating more similar scores.
Distribution Tails: Represent the extreme ends of the distribution, containing unusual or infrequent score values, often referred to as outliers.
Asymmetric distributions, which lack symmetry around their center, can be classified as positively or negatively skewed.
Positively Skewed: Characterized by a long tail extending toward the positive side of the number line, indicating a concentration of scores at the lower end of the scale.
Negatively Skewed: Characterized by a long tail extending toward the negative side of the number line, indicating a concentration of scores at the higher end of the scale.
If most individuals score below the clinical cutoff for depressive symptoms (CES-D ≥ 16), the distribution of CES-D scores would be positively skewed, with a tail extending toward higher scores.
Conversely, if most individuals report feeling optimistic, the distribution of optimism scores would be negatively skewed, with a tail extending toward lower scores.
Histograms: Bar plots used to represent the distribution of numeric variables, with no gaps between bars to indicate the continuous nature of the data.
Kernel Density Plots: Smoothing algorithm that connects histogram bars with a continuous curve, providing a visual representation of the underlying probability density function.
Frequency = density = count = vertical elevation in these graphical displays.
A study investigated the predictive role of visual impairment, moderated by optimism/pessimism, on depressive symptoms in adults diagnosed with uveal melanoma.
Histograms and kernel density plots were used to examine the distribution of depressive symptoms and optimism scores within the study population.
A study explored a spillover model of internalizing symptom development in relation to perceived discrimination among Latinx adolescents.
Perceived discrimination encompasses negative attitudes or unfair treatment stemming from specific characteristics such as race or ethnicity.
Internalizing behaviors are inward-directed behaviors, including anxious and depressive symptoms, that can result from experiences of discrimination.
The 10-item discrimination subscale of the Social, Attitudinal, Familial, and Environmental Acculturative Stress-Child Scale (SAFE-C) was used to measure perceived discrimination.
Participants rated items on a scale ranging from 1 (Not at all true) to 4 (Very true), indicating the extent to which they had experienced discrimination.
Higher scores on the SAFE-C discrimination subscale indicate greater levels of perceived discrimination.
The distribution shape of perceived discrimination scores was analyzed to understand the patterns of discrimination experiences among Latinx adolescents.
Nominal Variables: Categorize group features into mutually exclusive, unordered categories (e.g., gender identity, racial/ethnic identity), where the categories have no inherent rank ordering.
Ordinal Variables: Group features into ordered categories that can be ranked (e.g., Likert scales, educational attainment groupings), where the categories have a meaningful order or hierarchy.
A bar plot is a graphical display for categorical variables, illustrating the number or percent of responses within each category.
A frequency distribution provides the same information in a tabular format, presenting the counts and percentages for each category.
A study examined the relationship between skin color satisfaction and binge eating behaviors in Black girls.
Skin color satisfaction was assessed using a 4-point Likert scale, capturing the participants' feelings about their skin color.
Participants rated their happiness with their skin color on a 4-point scale: 1 = Very happy, 2 = Happy, 3 = Unhappy, 4 = Very unhappy.
Scores were reverse coded so that higher scores reflect more satisfaction using the formula: New \,score = 5 - Old \,score.
Example: Old score=1 (Very happy becomes New score=4).
Bar graphs visually depict the distribution of categorical data, offering a quick and intuitive understanding of category frequencies.
Frequency distributions present the same information in a tabular format, including counts, percentages, and cumulative percentages, providing detailed insights into the data.
A study investigated racial and ethnic differences in posttraumatic stress disorder (PTSD) among postpartum women.
The racial/ethnic identity variable comprised Black, Latina, and non-Hispanic White women, allowing for comparisons across these groups.
Distribution shape is not applicable to nominal variables because the categories lack inherent rank ordering, making concepts like skewness irrelevant.
Examples of study questions include:
Sketching histograms for health ratings and patient satisfaction data.
Computing missing values from frequency distribution tables using available data.
Calculating cumulative percentages to understand the distribution of ordinal variables.
Identifying distribution shapes, such as skewness and modality, in various datasets.
/
The module provides instructions on how to conduct descriptive analyses and generate visualizations using both Jamovi and RStudio, two popular statistical software packages.
Demonstrations cover selecting variables, generating frequency tables, histograms, and kernel density plots within each software environment.
RStudio code examples illustrate how to create bar plots and frequency distributions programmatically.
Loading required R packages, such as ggplot2 for creating visualizations and summarytools for generating descriptive statistics.
Reading data from a remote file using functions like read.csv() or read_excel().
Summarizing variables using functions like describe