Measurement Scales, Latent Variables, and Statistical Inference – Study Notes

2.2.3 Algebraic Properties of the Scales

  • Different scales permit different valid algebraic operations.

  • Stevens’s four classic scales are discussed first, followed by embedding summative response scales within this framework.

2.2.4 Qualitative Versus Quantitative Measurement

  • Two broad categories of measurement scales: qualitative (categorical) and quantitative.

  • Qualitative measures characterize what is observed (nominal scale). Synonyms include:

    • Categorical variables

    • Nonmetric variables

    • Dichotomous variables (when there are two values/categories)

    • Grouped variables

    • Classification variables

  • Quantitative measurement is more restrictive in the sense that it is meaningful to compute a mean and standard deviation.

  • Ordinal scales presuppose an underlying quantitative dimension, but not every ordinal variable yields meaningful means.

  • Quantitative labels include:

    • Continuous variables (though many quantitative variables can be discrete stepping)

    • Metric variables

    • Ungrouped variables

2.2.5 Criticisms of Stevens's Schema

  • Stevens’s scale types are widely used but not without critique.

  • Other classification systems exist (e.g., Mosteller & Tukey, 1977).

  • Velleman & Wilkinson (1993) discuss critiques of Stevens’s notions.

  • Arguments that the mathematical operations you can perform depend more on the research questions than on the exact level of the scale (e.g., Guttman, 1977; Lord, 1953a).

  • The presented framework is a useful starting point for understanding scale types and implications for data analysis, but it is not dogma.

2.3 Independent Variables, Dependent Variables, and Covariates

  • Variables are central to research design, measurement, and statistics; roles can vary by analysis.

  • A variable can have multiple roles in different analyses (e.g., a mediator can be both dependent and independent in different parts).

2.3.1 Independent Variables

  • In a prototypical experimental study, the independent variable represents the manipulation by the researchers (the treatment effect) contrasted with a control.

  • Features:

    • May have two levels (e.g., control vs. experimental) or more (e.g., control, placebo, experimental).

    • It is a single entity or continuum regardless of levels.

    • In experiments, often based on qualitative measurement; in prediction (regression), usually quantitative.

  • In regression analyses, predictors are considered independent variables.

2.3.2 Dependent Variables

  • In experiments, the dependent variable is the outcome measured by the researchers.

  • In correlation/prediction designs, all measures can be viewed as dependent because there is no active manipulation.

  • Dependent variables can be assessed on any scale, but in the book’s designs they are almost always on quantitative scales.

  • In ANOVA, the dependent variable’s variance is explained by the independent variables; e.g., the statement "The mean for females was 3.52" refers to the dependent variable in that group.

  • In multiple regression, the criterion variable is the dependent variable.

2.3.3 Covariates

  • A covariate is a variable that correlates with a dependent variable and can influence observed relationships if not accounted for.

  • Covariates can mediate or confound relationships between independent and dependent variables.

  • Classic example: ice cream sales and crime rate both correlate with temperature; weather mediates their relationship.

  • If temperature is not accounted for, it acts as a confound.

  • If temperature is included as a covariate or mediator, the observed association between ice cream sales and crime rate can weaken after accounting for temperature.

  • In ANOVA contexts, including a covariate leads to ANCOVA (analysis of covariance), e.g., controlling for verbal ability when predicting math problem solving.

  • ANCOVA allows statistical control for a variable not experimentally controlled.

2.4 Between Subjects and Within Subjects Independent Variables

2.4.1 Between Subjects Variables

  • Between-subjects variable levels comprise separate groups (e.g., girls vs. boys; different diagnostic groups).

  • Scores in different groups are independent of each other.

2.4.2 Within Subjects Variables

  • Within-subjects variables include measurements on the same cases across conditions or times.

  • Example: pretest and posttest measurements on the same individuals.

  • Within-subjects variable is also called a repeated measures variable because scores across conditions are related.

2.5 Latent Variables and Measured Variables

2.5.1 Latent Variables

  • Latent variables are constructs identified in theory and not directly measured; they are assessed indirectly.

  • Historical illustration: manifest content vs. latent meaning (Freud, Tolman).

  • Examples: learning, motivation, job satisfaction, attitude toward life, ethnic identification.

  • Latent variables are central to many theories of human behavior.

2.5.2 Measured Variables

  • Measured variables are those for which we have actual data.

  • Examples: inventory item responses, choices, time spent in a behavior, gender indicator.

  • Also known as:

    • Manifest variables

    • Indicator variables

    • Observed variables

  • Measured variables serve as proxies or indicators for latent variables.

2.5.3 Linking Latent Variables to Measured Variables

  • In many multivariate designs, latent variables are posited and measured via indicators.

  • Example: the broad construct of achievement can be indicated by GPA; GPA is a quantitative indicator of a latent construct (achievement).

  • GPA is not a perfect indicator; multiple indicators may be warranted to better estimate the latent construct.

  • Latent constructs can be composed of measured variables and/or other latent variables.

  • Multivariate procedures help determine how measured variables should be weighted to form a latent construct (e.g., factor analysis, which identifies shared themes).

2.5.4 Variates as Latent Variables

  • Latent variables can be imagined as weighted combinations of multiple measured variables (variates).

  • Example: Coopersmith Self-Esteem Inventory (25 items, scored 1 for endorsement and 0 otherwise; a self-esteem score is formed by summing item scores and multiplying by 4).

  • The resulting self-esteem score is a variate: a latent construct formed from measured indicators.

  • Variates can combine measures from different sources (e.g., family history, symptoms, prior GPA for a willingness-to-seek-counseling construct).

  • Multivariate procedures (e.g., confirmatory factor analysis, SEM) relate latent variables to indicators and relate latent variables to each other.

2.6 Endogenous and Exogenous Variables

  • In path analysis and structural equation modeling, a model specifies how variables relate to each other.

  • Endogenous variables are explained or predicted by other variables in the model.

  • Exogenous variables have no predictors in the model and act as first causes to explain endogenous variables.

  • Example: sex and ethnic identification as exogenous variables explaining adherence to medication regimens; compliance is endogenous.

2.7 Statistical Significance

  • Statistical significance tests judge how likely it is that an observed outcome (e.g., a correlation or F value) would occur by chance if the population has no true effect.

  • General concept: test statistic is compared to what would be expected under the null hypothesis.

2.7.1 Degrees of Freedom

  • Degrees of freedom count how many values in a set can vary given constraints.

  • Example: for a set of five numbers with a fixed mean, four numbers can vary; the last is determined by the fixed mean (4 degrees of freedom).

  • Commonly, df = N - 1 for many statistics; for Pearson r, df = N - 2 because r is a standardized regression coefficient defined by two points.

2.7.2 Sampling Distributions

  • A sampling distribution shows the distribution of a statistic across repeated samples from the population.

  • For a true population correlation of 0, the sampling distribution is centered at 0 and symmetric; most sample correlations cluster around 0 with fewer extreme values.

  • For a true population correlation of 0.90, the distribution peaks near 0.90 and is highly constrained on the upper end (since r ≤ 1). The distribution becomes skewed as the true parameter approaches its bounds.

  • The shape depends on the true parameter value and sample size.

2.7.3 The Role of Sample Size

  • Larger samples yield sampling distributions that are tighter around the true parameter (smaller standard error), making it harder to obtain large deviations by chance.

  • Smaller samples yield more variability; large correlations can occur by chance with small N.

2.7.4 Determination of Significance

  • If the sampling distribution is normal, 95% of area lies within ±1.96 standard deviation units from the mean. Values beyond this region are considered statistically significant at alpha = 0.05.

  • Some statistics have non-normal sampling distributions (e.g., t, F, noncentral distributions). Modern software provides exact p-values based on the appropriate distribution.

  • For Pearson r, the test statistic t is defined as
    t = \frac{r \sqrt{N-2}}{\sqrt{1-r^2}}
    and is compared to the t distribution with df = N - 2.

  • When population effect size is not zero, transformations (e.g., Fisher’s z') and subsequent steps are used to assess significance.

2.7.5 Levels of Significance

  • A pre-set alpha level (e.g., \alpha = 0.05) is chosen before analysis.

  • Significance is a yes/no decision: a result is either statistically significant or not; terms like "highly significant" are discouraged.

  • Strength of an observed effect is better assessed via effect size indices or variance explained rather than significance alone.

2.7.6 Statistical Significance Versus Confidence Interval Estimation

  • Confidence intervals offer a range of plausible values for a parameter, conveying precision and practical significance.

  • Example: reporting that a one-year survival rate increased by 20 percentage points with a 95% confidence interval of 15 to 24 percentage points is often more informative than just a p-value.

  • Confidence intervals complement NHST and help avoid over-interpretation of dichotomous results.

2.7.7 Null Hypothesis

  • NHST assumes the null hypothesis is true; the p-value indicates how often you would observe the statistic or more extreme if the null is true.

  • Rejecting the null suggests the observed effect is unlikely under the null, but not necessarily practically important.

  • Rejection risk (Type I error) equals the alpha level; true relationships can still be missed if the null is rarely true due to sampling variability.

  • Thompson (1994) notes that when the null is not literally true in the population, sample means may differ even with nonzero true differences, complicating interpretation.

2.7.8 Type I and Type II Errors

  • Type I error: incorrectly rejecting the null hypothesis (false positive). Probability equals the alpha level.

  • Type II error: failing to detect a true effect (false negative). Probability is beta; power = 1 - \beta.

  • Alpha and power are linked; stricter alpha reduces Type I error but increases Type II error, and vice versa.

  • Practical considerations (e.g., severity of consequences) influence the choice of alpha.

2.7.9 The Current Status of Statistical Significance Testing

  • There is growing concern about overreliance on NHST; emphasis is shifting toward replication, effect sizes, and confidence intervals.

  • The APA (2010) encourages reporting exact p-values and including effect sizes and confidence intervals in reports.

  • NHST is not discarded but should be integrated with estimation approaches for comprehensive interpretation.

2.8 Statistical Power

  • Power is the probability of correctly rejecting a false null hypothesis (i.e., detecting a true effect).

  • Power is influenced by alpha, effect size, and sample size.

2.8.1 Definition of Power

  • Power = 1 - \beta, where beta is the probability of a Type II error.

  • Three main factors affect power: alpha level, effect size, and sample size.

2.8.2 Alpha Level

  • The alpha level controls the risk of a Type I error: e.g., \alpha = 0.05 means a 5% risk.

  • Increasing stringency (e.g., \alpha = 0.01) reduces Type I error but can increase Type II error (lower power).

  • In some contexts (e.g., medical research), extremely small alphas (e.g., 0.001) may be warranted to avoid dangerous false positives.

  • Conversely, higher alpha (e.g., 0.10 or 0.15) can increase power when the consequences of Type I error are not severe.

  • Example scenario from Pituch & Stevens (2016) discusses trade-offs between Type I and Type II errors depending on consequences.

2.8.3 Effect Size

  • Effect size reflects the strength or magnitude of an effect and is linked to power: larger effects are easier to detect.

  • In the correlation context, effect size refers to the strength of association; common indices include r and eta squared ((\eta^2)).

  • Three widely used effect-size indices:

    • Cohen's d (for mean differences)

    • Hedges' g (bias-corrected version of d for small samples)

    • Glass' delta (uses control SD in the denominator)

  • They are calculated from the difference between group means and a variance term:

    • Cohen's d:
      d = \frac{\bar{X}1 - \bar{X}2}{\sqrt{\frac{SD1^2 + SD2^2}{2}}}

    • Hedges' g:
      g = \frac{\bar{X}1 - \bar{X}2}{\sqrt{\frac{(n1-1) SD1^2 + (n2-1) SD2^2}{n1 + n2 - 2}}}

    • Glass' delta:
      \text{delta} = \frac{\bar{X}1 - \bar{X}2}{SD_{\text{Control}}} {where the control group SD is used in the denominator}

  • Cohen provides general guidelines for interpreting effect sizes (without context):

    • For single-sample t, d: small ≈ 0.20, medium ≈ 0.50, large ≈ 0.80.

    • For Pearson r: small ≈ 0.10, medium ≈ 0.30, large ≈ 0.50.

    • For eta-squared ((\eta^2)) in ANOVA: small ≈ 0.01, medium ≈ 0.06, large ≈ 0.14.

  • Figure reference: shows the calculation procedures for d, g, and delta.

  • In broader modeling contexts (e.g., SEM, CFA), RMSEA, GFI, and TLI are analogous indices of model fit, related to the strength of effects and their consistency with data.

2.8.4 Sample Size

  • Larger sample sizes generally increase power by reducing standard errors and narrowing confidence intervals.

  • Degrees of freedom are tied to sample size; larger df means smaller required test statistic values to achieve significance.

  • Researchers can perform power analyses in advance to determine needed sample size to achieve desired power.

  • Caution: very large samples can yield statistically significant results for trivial effects; effect-size reporting helps maintain perspective on practical significance.