Session 4: Statistics

Introduction to Statistics in Psychodiagnostics
  • Statistics help organize, summarize, interpret, and communicate assessment results.

  • Competence in assessment requires statistical knowledge to understand, interpret, and evaluate test psychometrics (reliability, validity, standardization).

Descriptive vs Inferential Statistics
  • Descriptive Statistics: Summarize large datasets clearly.

  • Inferential Statistics: Make inferences about a population from a sample.

Variable
  • A variable is anything with more than one value, e.g., achievement, intelligence.

Measurement
  • Assigning numbers or symbols to objects, traits, or behavior per logical rules.

  • Example: Customer satisfaction on a 10-point scale.

Scale of Measurement
  • Categorizes variables. There are 4 scales: Nominal, Ordinal, Interval, Ratio.

Scale of Measurement: Key Properties
  • Defined by three properties:

    1. Magnitude: Inherent order (smaller to larger).

    2. Equal interval: Equal distance between adjacent points.

    3. Absolute/true zero: Zero means absence of the property.

Scale of Measurement: Qualities, Examples
  • Nominal: No magnitude (e.g., names).

  • Ordinal: Magnitude present (e.g., rank order, Likert scales).

  • Interval: Magnitude and equal interval (e.g., temperature).

  • Ratio: Magnitude, equal intervals, and true zero (e.g., age, height, weight).

Describing Scores
  • Use descriptive statistics to organize scores:

    • Frequency distribution: Order scores.

    • Measures of central tendency: Typical performance (mean, median, mode).

    • Measures of variability: Dispersion of scores (spread).

    • Measures of relationship: Degree of relationship between variables.

Understanding Assessment Scores
  • Raw scores: Number of correct answers; only meaningful when compared to a standard.

  • Criterion-referenced scores: Compare individual to a specified performance level.

  • Norm-referenced scores: Compare individual to a norm group.

Criterion-Referenced Scores
  • Interpreted in absolute terms (percentages, cutoff scores) to show mastery.

  • Example: Passing score for a course.

Norm-Referenced Scores: Overview
  • Compare examinee to a relevant, representative, current, and adequately sized norm group.

  • Norms should be updated approximately every 10 years.

Type of Norm-Referenced Scores

1) Percentile ranks (PR)

  • Percentage of a distribution below a particular score.

  • Describes exact position within the distribution.

  • Quartiles: divide into 4 equal parts (lower, median, upper quartile).

2) Standard scores

  • Represent relative position, assuming a normal distribution.

  • Linear transformations of raw scores, retaining a direct relationship.

  • Examples: Z scores, T scores, Deviation IQs, CEEB scores, stanines, sten scores.

Standardized Score Examples (Visual Concept)
  • Z score: Typically -3 to +3 (approx. 68% within\pm1 SD, 95% within\pm2 SD, 99.7% within\pm3 SD).

  • Other scales: Deviation IQ, T scores, Stanine (9-point), Sten scores, SAT/GRE scales.

3) Grade and age equivalents

  • Grade-equivalent scores: Average score for children at various grades. (norm ref.)

  • Age-equivalent scores: Performance in terms of age at which average individual matches.

Reliability
  • Consistency, dependability, and reproducibility of test scores across items, forms, or repeated administrations.

  • If reliable, repeated measures under same conditions yield identical/nearly identical results.

  • Core equation: Observed\;score = True\;score + Measurement\;error

Measurement Error
  • Scores are rarely error-free; error from test-taker factors, flawed procedures, or chance.

  • True score is ideal; observed score is actual. Greater error means lower reliability.

  • Relation: obs. score = true score + measurement error.

Sources of Measurement Error

1) Time-sampling error: Fluctuations due to when repeated testing occurs (e.g., practice effects, maturation).
2) Content-sampling error: Error from inadequate item selection to cover content.
3) Interrater differences: Error from subjective scoring judgments; assessed by interrater reliability.
4) Other sources: Item quality, test length, test-taker variables (motivation, fatigue), poor administration, room conditions.

Validity
  • Refers to whether assessment claims/decisions are sound, meaningful, and useful for the intended purpose.

  • Degree to which all evidence supports intended interpretation of test scores.

Construct Validity
  • Latent variables (constructs) cannot be measured directly (e.g., aggression, resilience, depression).

  • Inferred from interrelated variables/dimensions.

Threats to Validity
  • Underrepresentation: Test too narrow.

  • Construct irrelevant variance: Test too broad, includes irrelevant variables.

  • Other threats: Ambiguous items, too few items, improper item arrangement, scoring errors, test-taker characteristics (anxiety), inappropriate test groups.

Validity and Reliability Relationship
  • Reliability is necessary but not sufficient for validity.

  • Unreliable measures cannot be valid; reliable measures can still be invalid (measuring the wrong construct).

Types of Validity Evidence
  • Face validity: surface-level appropriateness

  • Content Validity: Evidence based on test content's representativeness of the content domain.

  • Criterion-related Validity: Evidence based on test scores' relationship with external criteria (Concurrent and Predictive validity).

  • Construct Validity: Evidence based on appropriateness of inferences about a construct (homogeneity, convergent/discriminant validity, group differentiation, factor analysis).

    • Convergent validity

      This means your test should be strongly related to other tests that measure the same thing.
      Example: If you create a new questionnaire for social anxiety, it should correlate highly with an established social anxiety scale. That shows both are measuring the same construct.

    • Discriminant validity

      This means your test should not be too strongly related to tests that measure different things.
      Example: Your social anxiety test should not correlate strongly with a math ability test. If it did, that would suggest your test isn’t specific and is picking up something unrelated.

Important Practical Takeaways
  • Reliability sets the upper limit on validity.

  • Validity depends on the intended score interpretation.

  • Consider norm group representativeness when interpreting scores.

  • Be mindful of measurement error sources biasing conclusions.

Common Examples and Terms Mentioned in the Transcript
  • Reliability examples: Identical/nearly identical measurements.

  • Validity types: Face, Content, Criterion-related (predictive & concurrent), Construct (convergent & discriminant).

  • Examples of tests: TAT & CAT, WAIS & WISC.

  • Normed score types: Percentile ranks, Z/T/Deviation IQ, Stanine, Sten; Grade and Age Equivalents.

Key Formulas and Notations
  • Observed score relation to true score and error:

    • Observed\;score = True\;score + Measurement\;error

  • Standard score via linear transformation:

    • Z = \frac{X - mean(X)}{sd(X)}