Measurement, Distributions, and Percentiles – Study Notes

Measurement, Distributions, and Percentiles – Comprehensive Study Notes

  • Acknowledgement and context

    • Opening and scope: this week covers measurement, frequency distributions, and percentiles; gradual introduction to numbers.
    • Mid-semester exam scope: weeks 1–4 content; practice materials and quizzes recommended; exam date announced on Blackboard (Saturday, September 6).
    • Relevance across degree: data cleaning, exploration, and analysis are common tasks in assignments; honors year in psychology involves a full year of study design, data collection, analysis, and thesis writing – these topics are foundational for that workflow.
  • Big-picture progression of a study in psychology

    • Three stages: design a study, run the study, then analyze the numbers you collect.
    • First data-processing steps: create plots, explore data, clean data.
    • Throughout a degree, you’ll repeatedly clean, explore, and analyze data; in honors, you’ll perform this across a year end-to-end.
  • Core topics of the lecture

    • First half: measurement of psychological constructs, reliability, sensitivity, and related concepts.
    • Second half: data presentation and storytelling with figures; plotting decisions that tell a clear story.
  • Measurement and empirical foundations

    • Constructs vs. observable phenomena: psychological constructs like anxiety or memory are not directly observable; operational definitions are needed to bound what counts as a measure of the construct.
    • Operational definition example: imitation in infants (becoming the stimulus) with a tongue-protrusion paradigm.
    • Coding scheme example (to operationalize imitation):
      • 0 = no response
      • 1 = partial response (e.g., some tongue movement but not clearly imitative)
      • 2 = full response (clear, unambiguous tongue protrusion)
    • Researchers often train coders, use multiple trials, and rely on agreed-upon criteria to improve reliability and validity of these judgments.
    • Empiricism and objectivity: measurement should capture observable phenomena that can be checked and verified by others; openness and replication are healthy for scientific progress.
  • Variables and measurement scales (types and implications)

    • Variable: a characteristic of interest for each individual in a population or sample (e.g., memory capacity, distraction condition).
    • Qualitative (categorical) vs. quantitative (numerical) attributes:
    • Qualitative: categories without intrinsic numeric magnitude (e.g., gender, eye color, political affiliation).
    • Quantitative: numeric values with meaningful magnitude (e.g., height, weight, income).
    • Measurement is about assigning numbers to observations according to consistent rules (operational definitions).
    • Qualitative variables can be coded numerically (e.g., eye color: 0–blue, 1–brown, etc.), but not all numerical operations are meaningful on qualitative data (e.g., averaging eye color codes).
    • Quantitative scales and ordering:
    • Discrete vs. Continuous: discrete has whole numbers (e.g., number of cars passing by); continuous can take any value within a range (e.g., height).
    • Dichotomous: a special discrete case with only two values (e.g., alive/dead, true/false).
    • Scales of measurement (from simplest to most informative):
    • Nominal: categories with no intrinsic order (e.g., eye color, political party labels). No meaningful magnitude, equal intervals, or true zero.
      • Example: color labels (Yellow=2, Green=4, etc.) are labels; the numbers are identifiers, not magnitudes.
    • Ordinal: order matters, but intervals between values are not necessarily equal (e.g., race placement, level of preference).
      • Example: ranking Smarties by preference: red=1, blue=2, green=3, etc. Order matters, but gaps are not quantified.
    • Interval: order and meaningful equal intervals, but no true zero (e.g., IQ scores, temperature in Celsius).
      • Distances between values are interpretable, but 0°C does not mean 'no temperature.'
    • Ratio: order, meaningful equal intervals, plus a meaningful zero that allows ratio comparisons (e.g., height, weight, Kelvin temperature, age).
      • With a true zero, statements like 'twice as tall' are meaningful.
    • The choice of scale affects allowable statistics and the kinds of claims you can make.
    • Measurement of constructs in psychology requires careful consideration of scale properties and the interpretation of results.
  • Reliability and validity: core psychometrics concepts

    • Reliability: stability and consistency of a measure across time, raters, or trials.
    • Test-retest reliability: administer the same test twice; scores should be similarly related if the underlying trait is stable.
      • Represented visually by a scatter plot of Test 1 vs Test 2 scores; a strong positive correlation indicates reliability.
      • Realistically, perfect identical scores are unlikely due to day-to-day variation (sleep, mood, etc.).
    • Inter-rater reliability: agreement between two or more raters who assess the same data; assessed by correlation between their scores.
      • Acceptable reliability is often around r ≈ 0.60 or higher; higher is better.
    • Validity: the extent to which a measure captures what it is intended to measure.
    • Internal validity: the extent to which observed effects are due to the manipulation rather than confounds; lack of control for confounds reduces internal validity.
    • External validity: generalizability of findings beyond the study sample or setting (e.g., WEIRD samples: Western, Educated, Industrialized, Rich, Democratic).
      • Low external validity means limited generalizability to other populations or cultures.
    • Construct validity: how well a test or measure actually captures the theoretical construct of interest.
      • Example: Beck Depression Inventory (BDI) faced questions of whether some items truly map onto depression vs. anxiety; concerns about construct validity if items overlap with anxiety constructs.
    • Content/Face validity: the intuitive apparent fit of a measure to the construct; what it seems to measure on the surface.
      • Example: a depression measurement that asks about temperature would likely have low face validity despite potential statistical reliability.
    • Predictive validity: extent to which a measure predicts outcomes it should predict (e.g., ATAR predicting university performance).
    • Range effects (floor and ceiling effects): a measure too easy or too hard can fail to discriminate among participants.
    • Ceiling effect: most participants perform at the top end, limiting ability to detect differences.
    • Floor effect: most participants perform at the bottom end.
    • Pilot testing helps calibrate measures to avoid these effects, ensuring sensitivity to differences.
  • Measurement design considerations and pilot testing

    • Pilot testing: iterative testing of the design and stimuli to ensure the task yields usable, discriminating data; helps identify floor/ceiling effects and timing or presentation issues.
    • The role of pilot testing in avoiding wasted data collection time and ensuring the stimulus yields a useful range of responses.
    • Ethical and practical implications: robust measurement improves scientific validity and the efficiency of research; poor measurement wastes resources and could mislead interpretations.
  • Designing studies and addressing variability

    • Study types and randomization: experimental studies, randomized controlled trials, observational studies, quasi-experiments, and correlational designs; randomization helps control for confounds.
    • Confounding variables: factors that co-occur with the IV and can threaten the interpretation of results; strategies include control groups/conditions and counterbalancing.
    • Independent groups design vs. repeated measures design:
    • Independent groups: different participants in each condition; straightforward but may require more participants.
    • Repeated measures: same participants across conditions; more powerful but susceptible to carryover and order effects; counterbalancing mitigates confounds.
  • Data organization, exploration, and visualization (the second half of the lecture)

    • Purpose of displaying data: to tell a story, reveal patterns, detect errors, and support interpretation beyond text.
    • Data quality reality: psychology data are often messy due to human factors; data exploration helps identify anomalies, missing values, and transcription errors.
    • Data cleaning: removing or correcting erroneous data, filtering noise, handling missing values, and preparing data for analysis.
    • From raw matrices to interpretable summaries: moving from a matrix of 100 students × 10 questions to interpretable summaries such as distributions and summaries.
  • Frequency distributions and data display options

    • Frequency table: tallies the number of observations per score or category; useful for qualitative data and small ranges.
    • Relative frequency: the proportion of observations in each category, computed as extrelativefrequency=extfrequencyNext{relative frequency} = \frac{ ext{frequency}}{N} where NN is the total sample size.
    • Cumulative frequency: the total number of observations up to and including a given category; used to compute percentiles.
    • Intervals (bins) for continuous data: group observations into non-overlapping bins (e.g., 50–54, 55–59, etc.). Practical guidance: aim for around 10–20 bins; avoid overlaps; choose bins to enable proper polygon plotting and to support meaningful interpretation.
    • Why start bins with an underflow bin (e.g., 45–49) even if empty: to ensure the frequency polygon can start at zero and hit the x-axis cleanly.
    • Frequency polygon: a line plot connecting bin midpoints with heights corresponding to frequencies; useful for visualizing distributions, especially when comparing multiple groups.
    • Bar graphs: good for qualitative (nominal) data; bars should not touch to reflect discrete categories.
    • Histograms: bar plots with touching bars; appropriate for continuous or binned data to reflect the continuity of the scale.
    • Box-and-whisker plots: convey median, interquartile range (IQR), and extremes; useful for showing central tendency and dispersion in one figure; box spans the central 50% of data (IQR); median shown inside the box; whiskers extend to the min and max or to some percentile bounds.
    • Frequency histograms vs. frequency polygons vs. box plots: each has strengths for different data types and storytelling goals; choice depends on the data and the story you want to tell.
    • Example storytelling with plots: male vs. female weights, actual vs. ideal weights; using frequency polygons to compare distributions and dot plots to show cross-group comparisons.
  • Percentiles and percentile calculations (core quantitative concept)

    • Percentile: the value below which a specified percentage of scores fall; percentile rank is the proportion of scores at or below a given value.
    • Fundamental formula:
    • Percentile rank of a score: P=CFNimes100P = \frac{CF}{N} imes 100 where CFCF is the cumulative frequency up to that score, and NN is the total number of scores.
    • Inverse calculation (finding the score at a given percentile):
    • Cumulative frequency target: CF=P100imesNCF = \frac{P}{100} imes N. Then locate the smallest score whose cumulative frequency is at least CFCF.
    • Practical example from the transcript:
    • Suppose a distribution with total N=20N=20 and a score of 23 has a cumulative frequency of 7. The percentile would be:
      • P = rac{CF}{N} imes 100 = rac{7}{20} imes 100 = 35 ext{ } rac{ ext{percent}}{}
      • ext{So a score of 23 is in the 35th percentile.}
    • To find the score at the 85th percentile for the same data:
      • Target CF = rac{85}{100} imes 20 = 17.
      • Look for the score with cumulative frequency 17; the example in the transcript found that to be a score of 25, so you’d need a score of 25 or higher to beat at least 85% of the class.
    • Relative vs. cumulative frequency recap:
    • Relative frequency: frequency/N.
    • Cumulative frequency: sum of frequencies up to and including a given score.
    • Example with larger data: TV-watching hours (259 students) – calculating percentile for 7 hours from a grouped distribution and summarizing with a frequency polygon to visualize distribution around the 63rd percentile.
    • Practical interpretation: percentile ranks convey how an individual compares to the distribution (e.g., “in the 35th percentile” means better than 35% of the group).
  • Illustrative data examples used in the lecture

    • Imitation operational definition example (in infants): demonstrated coding challenges and inter-rater reliability concerns when judging whether an infant imitates tongue protrusion.
    • Weight data example (72 male students): discussion of wide weight range, use of bins (e.g., 60–64 kg), and how to interpret a 65–69 kg peak.
    • TV-watching hours example (259 students): determination of a typical amount and identification of an extreme outlier (e.g., 40 hours/week).
    • Male vs. female weight comparisons using frequency polygons and ideal weights to illustrate storytelling with plots.
  • Practical implications for data analysis and reporting

    • Choose graphs that tell the story clearly and faithfully; the reader should grasp the message at a glance.
    • Use appropriate data displays for different data types:
    • Qualitative data: bar graphs (nominal categories, non-touching bars to emphasize discreteness).
    • Quantitative data: histograms, frequency polygons, box plots; consider 10–20 bins for histograms.
    • Data quality and preparation: removing or correcting errors, identifying outliers, and ensuring the data meet the assumptions of planned analyses.
    • Inferential testing readiness: well-plotted data facilitate checking assumptions (normality, homogeneity of variance) and improve interpretability of statistical tests.
    • Reporting and publication: visuals should support the written narrative and help convey the study’s claims without excessive text.
  • Links to upcoming and related content

    • Next lecture focus: central tendency (mean, median, mode) and variability (how scores move around the center).
    • Mathematical prerequisites for upcoming topics: basic calculator skills (add/subtract/multiply/divide, square, square root).
    • Symbols and notation to know:
    • Sigma for summation:
      \sum x\,
    • Inequalities and their counterparts (>, <, ≥, ≤).
    • Positive and negative values: +/− signs.
    • Readings and practice materials:
    • Aaron textbook, Chapter 1; UQ Extend Module 4.
    • For next week: Aaron Chapter 2; UQ Extend Module 5.
    • Assessment:
    • Quiz for the week opens in 1 hour and closes Monday.
  • Ethical, philosophical, and practical implications raised

    • Open science and construct validity: the need for robust constructs and transparent operational definitions to enable replication and critique.
    • External validity concerns: most psychology research uses WEIRD populations; explicit caution about generalizability to diverse cultures and settings.
    • The healthy scientific process includes debate over operational definitions and ongoing refinement; disagreements drive methodological improvements and consensus over time.
  • Quick reference formulas and concepts (summary)

    • Percentile rank: P = rac{CF}{N} imes 100
    • Inverse percentile (finding score at percentile P): CF = rac{P}{100} imes N$$
    • Box-and-whisker plot components: median, interquartile range (IQR), whiskers (min/max or defined bounds).
    • Reliability types: test-retest (consistency over time), inter-rater (consistency across raters).
    • Validity types: internal, external, construct, content/face, predictive.
    • Data-display choices: nominal data → bar graphs with gaps; continuous data → histograms or frequency polygons; distributions → consider 10–20 bins; outliers identified via plots.
    • Range effects: ceiling/floor effects; pilot testing to optimize measurement sensitivity.
  • Final reminders for exam preparation

    • Practice building and interpreting frequency tables, histograms, and frequency polygons.
    • Be comfortable with percentiles, cumulative frequencies, and translating percentile ranks into actionable interpretation.
    • Understand the relationship between reliability, validity, and the conclusions you can draw from data.
    • Review the next set of topics (central tendency and variability) and ensure you can perform basic statistical operations with a calculator.
  • Notes on exam readiness

    • Focus on being able to explain why we choose certain scales and plots for different data types.
    • Be able to articulate the implications of floor/ceiling effects and how pilot testing mitigates them.
    • Be able to discuss external validity concerns in the context of WEIRD samples and cross-cultural generalizability.
  • References to course materials mentioned in the lecture

    • Aaron textbook, Chapter 1 (and Chapter 2 for the next session)
    • UQ Extend Module 4 (and Module 5 for next session)
  • Summary takeaway

    • Measuring psychological constructs requires careful operational definitions and awareness of scale properties.
    • Reliability and validity determine whether our measures can support credible conclusions.
    • Organizing and displaying data thoughtfully helps tell the right story and supports valid inferences for statistical testing.