Measurement, Distributions, and Percentiles – Study Notes

Measurement, Distributions, and Percentiles – Comprehensive Study Notes

  • Acknowledgement and context

    • Opening and scope: this week covers measurement, frequency distributions, and percentiles; gradual introduction to numbers.
    • Mid-semester exam scope: weeks 1–4 content; practice materials and quizzes recommended; exam date announced on Blackboard (Saturday, September 6).
    • Relevance across degree: data cleaning, exploration, and analysis are common tasks in assignments; honors year in psychology involves a full year of study design, data collection, analysis, and thesis writing – these topics are foundational for that workflow.
  • Big-picture progression of a study in psychology

    • Three stages: design a study, run the study, then analyze the numbers you collect.
    • First data-processing steps: create plots, explore data, clean data.
    • Throughout a degree, you’ll repeatedly clean, explore, and analyze data; in honors, you’ll perform this across a year end-to-end.
  • Core topics of the lecture

    • First half: measurement of psychological constructs, reliability, sensitivity, and related concepts.
    • Second half: data presentation and storytelling with figures; plotting decisions that tell a clear story.
  • Measurement and empirical foundations

    • Constructs vs. observable phenomena: psychological constructs like anxiety or memory are not directly observable; operational definitions are needed to bound what counts as a measure of the construct.
    • Operational definition example: imitation in infants (becoming the stimulus) with a tongue-protrusion paradigm.
    • Coding scheme example (to operationalize imitation):
      • 0 = no response
      • 1 = partial response (e.g., some tongue movement but not clearly imitative)
      • 2 = full response (clear, unambiguous tongue protrusion)
    • Researchers often train coders, use multiple trials, and rely on agreed-upon criteria to improve reliability and validity of these judgments.
    • Empiricism and objectivity: measurement should capture observable phenomena that can be checked and verified by others; openness and replication are healthy for scientific progress.
  • Variables and measurement scales (types and implications)

    • Variable: a characteristic of interest for each individual in a population or sample (e.g., memory capacity, distraction condition).
    • Qualitative (categorical) vs. quantitative (numerical) attributes:
    • Qualitative: categories without intrinsic numeric magnitude (e.g., gender, eye color, political affiliation).
    • Quantitative: numeric values with meaningful magnitude (e.g., height, weight, income).
    • Measurement is about assigning numbers to observations according to consistent rules (operational definitions).
    • Qualitative variables can be coded numerically (e.g., eye color: 0–blue, 1–brown, etc.), but not all numerical operations are meaningful on qualitative data (e.g., averaging eye color codes).
    • Quantitative scales and ordering:
    • Discrete vs. Continuous: discrete has whole numbers (e.g., number of cars passing by); continuous can take any value within a range (e.g., height).
    • Dichotomous: a special discrete case with only two values (e.g., alive/dead, true/false).
    • Scales of measurement (from simplest to most informative):
    • Nominal: categories with no intrinsic order (e.g., eye color, political party labels). No meaningful magnitude, equal intervals, or true zero.
      • Example: color labels (Yellow=2, Green=4, etc.) are labels; the numbers are identifiers, not magnitudes.
    • Ordinal: order matters, but intervals between values are not necessarily equal (e.g., race placement, level of preference).
      • Example: ranking Smarties by preference: red=1, blue=2, green=3, etc. Order matters, but gaps are not quantified.
    • Interval: order and meaningful equal intervals, but no true zero (e.g., IQ scores, temperature in Celsius).
      • Distances between values are interpretable, but 0°C does not mean 'no temperature.'
    • Ratio: order, meaningful equal intervals, plus a meaningful zero that allows ratio comparisons (e.g., height, weight, Kelvin temperature, age).
      • With a true zero, statements like 'twice as tall' are meaningful.
    • The choice of scale affects allowable statistics and the kinds of claims you can make.
    • Measurement of constructs in psychology requires careful consideration of scale properties and the interpretation of results.
  • Reliability and validity: core psychometrics concepts

    • Reliability: stability and consistency of a measure across time, raters, or trials.
    • Test-retest reliability: administer the same test twice; scores should be similarly related if the underlying trait is stable.
      • Represented visually by a scatter plot of Test 1 vs Test 2 scores; a strong positive correlation indicates reliability.
      • Realistically, perfect identical scores are unlikely due to day-to-day variation (sleep, mood, etc.).
    • Inter-rater reliability: agreement between two or more raters who assess the same data; assessed by correlation between their scores.
      • Acceptable reliability is often around r ≈ 0.60 or higher; higher is better.
    • Validity: the extent to which a measure captures what it is intended to measure.
    • Internal validity: the extent to which observed effects are due to the manipulation rather than confounds; lack of control for confounds reduces internal validity.
    • External validity: generalizability of findings beyond the study sample or setting (e.g., WEIRD samples: Western, Educated, Industrialized, Rich, Democratic).
      • Low external validity means limited generalizability to other populations or cultures.
    • Construct validity: how well a test or measure actually captures the theoretical construct of interest.
      • Example: Beck Depression Inventory (BDI) faced questions of whether some items truly map onto depression vs. anxiety; concerns about construct validity if items overlap with anxiety constructs.
    • Content/Face validity: the intuitive apparent fit of a measure to the construct; what it seems to measure on the surface.
      • Example: a depression measurement that asks about temperature would likely have low face validity despite potential statistical reliability.
    • Predictive validity: extent to which a measure predicts outcomes it should predict (e.g., ATAR predicting university performance).
    • Range effects (floor and ceiling effects): a measure too easy or too hard can fail to discriminate among participants.
    • Ceiling effect: most participants perform at the top end, limiting ability to detect differences.
    • Floor effect: most participants perform at the bottom end.
    • Pilot testing helps calibrate measures to avoid these effects, ensuring sensitivity to differences.
  • Measurement design considerations and pilot testing

    • Pilot testing: iterative testing of the design and stimuli to ensure the task yields usable, discriminating data; helps identify floor/ceiling effects and timing or presentation issues.
    • The role of pilot testing in avoiding wasted data collection time and ensuring the stimulus yields a useful range of responses.
    • Ethical and practical implications: robust measurement improves scientific validity and the efficiency of research; poor measurement wastes resources and could mislead interpretations.
  • Designing studies and addressing variability

    • Study types and randomization: experimental studies, randomized controlled trials, observational studies, quasi-experiments, and correlational designs; randomization helps control for confounds.
    • Confounding variables: factors that co-occur with the IV and can threaten the interpretation of results; strategies include control groups/conditions and counterbalancing.
    • Independent groups design vs. repeated measures design:
    • Independent groups: different participants in each condition; straightforward but may require more participants.
    • Repeated measures: same participants across conditions; more powerful but susceptible to carryover and order effects; counterbalancing mitigates confounds.
  • Data organization, exploration, and visualization (the second half of the lecture)

    • Purpose of displaying data: to tell a story, reveal patterns, detect errors, and support interpretation beyond text.
    • Data quality reality: psychology data are often messy due to human factors; data exploration helps identify anomalies, missing values, and transcription errors.
    • Data cleaning: removing or correcting erroneous data, filtering noise, handling missing values, and preparing data for analysis.
    • From raw matrices to interpretable summaries: moving from a matrix of 100 students × 10 questions to interpretable summaries such as distributions and summaries.
  • Frequency distributions and data display options

    • Frequency table: tallies the number of observations per score or category; useful for qualitative data and small ranges.
    • Relative frequency: the proportion of observations in each category, computed as ext{relative frequency} = rac{ ext{frequency}}{N} where NN is the total sample size.
    • Cumulative frequency: the total number of observations up to and including a given category; used to compute percentiles.
    • Intervals (bins) for continuous data: group observations into non-overlapping bins (e.g., 50–54, 55–59, etc.). Practical guidance: aim for around 10–20 bins; avoid overlaps; choose bins to enable proper polygon plotting and to support meaningful interpretation.
    • Why start bins with an underflow bin (e.g., 45–49) even if empty: to ensure the frequency polygon can start at zero and hit the x-axis cleanly.
    • Frequency polygon: a line plot connecting bin midpoints with heights corresponding to frequencies; useful for visualizing distributions, especially when comparing multiple groups.
    • Bar graphs: good for qualitative (nominal) data; bars should not touch to reflect discrete categories.
    • Histograms: bar plots with touching bars; appropriate for continuous or binned data to reflect the continuity of the scale.
    • Box-and-whisker plots: convey median, interquartile range (IQR), and extremes; useful for showing central tendency and dispersion in one figure; box spans the central 50% of data (IQR); median shown inside the box; whiskers extend to the min and max or to some percentile bounds.
    • Frequency histograms vs. frequency polygons vs. box plots: each has strengths for different data types and storytelling goals; choice depends on the data and the story you want to tell.
    • Example storytelling with plots: male vs. female weights, actual vs. ideal weights; using frequency polygons to compare distributions and dot plots to show cross-group comparisons.
  • Percentiles and percentile calculations (core quantitative concept)

    • Percentile: the value below which a specified percentage of scores fall; percentile rank is the proportion of scores at or below a given value.
    • Fundamental formula:
    • Percentile rank of a score: P = rac{CF}{N} imes 100 where CFCF is the cumulative frequency up to that score, and NN is the total number of scores.
    • Inverse calculation (finding the score at a given percentile):
    • Cumulative frequency target: CF = rac{P}{100} imes N. Then locate the smallest score whose cumulative frequency is at least CFCF.
    • Practical example from the transcript:
    • Suppose a distribution with total N=20N=20 and a score of 23 has a cumulative frequency of 7. The percentile would be:
      • P = rac{CF}{N} imes 100 = rac{7}{20} imes 100 = 35 ext{ } rac{ ext{percent}}{}
      • ext{So a score of 23 is in the 35th percentile.}
    • To find the score at the 85th percentile for the same data:
      • Target CF = rac{85}{100} imes 20 = 17.</li><li>Lookforthescorewithcumulativefrequency17;theexampleinthetranscriptfoundthattobeascoreof25,soyoudneedascoreof25orhighertobeatatleast85<li>Relativevs.cumulativefrequencyrecap:</li><li>Relativefrequency:frequency/N.</li><li>Cumulativefrequency:sumoffrequenciesuptoandincludingagivenscore.</li><li>Examplewithlargerdata:TVwatchinghours(259students)calculatingpercentilefor7hoursfromagroupeddistributionandsummarizingwithafrequencypolygontovisualizedistributionaroundthe63rdpercentile.</li><li>Practicalinterpretation:percentileranksconveyhowanindividualcomparestothedistribution(e.g.,inthe35thpercentilemeansbetterthan35<li><p>Illustrativedataexamplesusedinthelecture</p><ul><li>Imitationoperationaldefinitionexample(ininfants):demonstratedcodingchallengesandinterraterreliabilityconcernswhenjudgingwhetheraninfantimitatestongueprotrusion.</li><li>Weightdataexample(72malestudents):discussionofwideweightrange,useofbins(e.g.,6064kg),andhowtointerpreta6569kgpeak.</li><li>TVwatchinghoursexample(259students):determinationofatypicalamountandidentificationofanextremeoutlier(e.g.,40hours/week).</li><li>Malevs.femaleweightcomparisonsusingfrequencypolygonsandidealweightstoillustratestorytellingwithplots.</li></ul></li><li><p>Practicalimplicationsfordataanalysisandreporting</p><ul><li>Choosegraphsthattellthestoryclearlyandfaithfully;thereadershouldgraspthemessageataglance.</li><li>Useappropriatedatadisplaysfordifferentdatatypes:</li><li>Qualitativedata:bargraphs(nominalcategories,nontouchingbarstoemphasizediscreteness).</li><li>Quantitativedata:histograms,frequencypolygons,boxplots;consider1020binsforhistograms.</li><li>Dataqualityandpreparation:removingorcorrectingerrors,identifyingoutliers,andensuringthedatameettheassumptionsofplannedanalyses.</li><li>Inferentialtestingreadiness:wellplotteddatafacilitatecheckingassumptions(normality,homogeneityofvariance)andimproveinterpretabilityofstatisticaltests.</li><li>Reportingandpublication:visualsshouldsupportthewrittennarrativeandhelpconveythestudysclaimswithoutexcessivetext.</li></ul></li><li><p>Linkstoupcomingandrelatedcontent</p><ul><li>Nextlecturefocus:centraltendency(mean,median,mode)andvariability(howscoresmovearoundthecenter).</li><li>Mathematicalprerequisitesforupcomingtopics:basiccalculatorskills(add/subtract/multiply/divide,square,squareroot).</li><li>Symbolsandnotationtoknow:</li><li>Sigmaforsummation:.</li> <li>Look for the score with cumulative frequency 17; the example in the transcript found that to be a score of 25, so you’d need a score of 25 or higher to beat at least 85% of the class.</li></ul></li> <li>Relative vs. cumulative frequency recap:</li> <li>Relative frequency: frequency/N.</li> <li>Cumulative frequency: sum of frequencies up to and including a given score.</li> <li>Example with larger data: TV-watching hours (259 students) – calculating percentile for 7 hours from a grouped distribution and summarizing with a frequency polygon to visualize distribution around the 63rd percentile.</li> <li>Practical interpretation: percentile ranks convey how an individual compares to the distribution (e.g., “in the 35th percentile” means better than 35% of the group).</li></ul></li> <li><p>Illustrative data examples used in the lecture</p> <ul> <li>Imitation operational definition example (in infants): demonstrated coding challenges and inter-rater reliability concerns when judging whether an infant imitates tongue protrusion.</li> <li>Weight data example (72 male students): discussion of wide weight range, use of bins (e.g., 60–64 kg), and how to interpret a 65–69 kg peak.</li> <li>TV-watching hours example (259 students): determination of a typical amount and identification of an extreme outlier (e.g., 40 hours/week).</li> <li>Male vs. female weight comparisons using frequency polygons and ideal weights to illustrate storytelling with plots.</li></ul></li> <li><p>Practical implications for data analysis and reporting</p> <ul> <li>Choose graphs that tell the story clearly and faithfully; the reader should grasp the message at a glance.</li> <li>Use appropriate data displays for different data types:</li> <li>Qualitative data: bar graphs (nominal categories, non-touching bars to emphasize discreteness).</li> <li>Quantitative data: histograms, frequency polygons, box plots; consider 10–20 bins for histograms.</li> <li>Data quality and preparation: removing or correcting errors, identifying outliers, and ensuring the data meet the assumptions of planned analyses.</li> <li>Inferential testing readiness: well-plotted data facilitate checking assumptions (normality, homogeneity of variance) and improve interpretability of statistical tests.</li> <li>Reporting and publication: visuals should support the written narrative and help convey the study’s claims without excessive text.</li></ul></li> <li><p>Links to upcoming and related content</p> <ul> <li>Next lecture focus: central tendency (mean, median, mode) and variability (how scores move around the center).</li> <li>Mathematical prerequisites for upcoming topics: basic calculator skills (add/subtract/multiply/divide, square, square root).</li> <li>Symbols and notation to know:</li> <li>Sigma for summation:
        \sum x\,
      • Inequalities and their counterparts (>, <, ≥, ≤).
      • Positive and negative values: +/− signs.
      • Readings and practice materials:
      • Aaron textbook, Chapter 1; UQ Extend Module 4.
      • For next week: Aaron Chapter 2; UQ Extend Module 5.
      • Assessment:
      • Quiz for the week opens in 1 hour and closes Monday.
    • Ethical, philosophical, and practical implications raised

      • Open science and construct validity: the need for robust constructs and transparent operational definitions to enable replication and critique.
      • External validity concerns: most psychology research uses WEIRD populations; explicit caution about generalizability to diverse cultures and settings.
      • The healthy scientific process includes debate over operational definitions and ongoing refinement; disagreements drive methodological improvements and consensus over time.
    • Quick reference formulas and concepts (summary)

      • Percentile rank: P = rac{CF}{N} imes 100</li><li>Inversepercentile(findingscoreatpercentileP):</li> <li>Inverse percentile (finding score at percentile P):CF = rac{P}{100} imes N$$
      • Box-and-whisker plot components: median, interquartile range (IQR), whiskers (min/max or defined bounds).
      • Reliability types: test-retest (consistency over time), inter-rater (consistency across raters).
      • Validity types: internal, external, construct, content/face, predictive.
      • Data-display choices: nominal data → bar graphs with gaps; continuous data → histograms or frequency polygons; distributions → consider 10–20 bins; outliers identified via plots.
      • Range effects: ceiling/floor effects; pilot testing to optimize measurement sensitivity.
    • Final reminders for exam preparation

      • Practice building and interpreting frequency tables, histograms, and frequency polygons.
      • Be comfortable with percentiles, cumulative frequencies, and translating percentile ranks into actionable interpretation.
      • Understand the relationship between reliability, validity, and the conclusions you can draw from data.
      • Review the next set of topics (central tendency and variability) and ensure you can perform basic statistical operations with a calculator.
    • Notes on exam readiness

      • Focus on being able to explain why we choose certain scales and plots for different data types.
      • Be able to articulate the implications of floor/ceiling effects and how pilot testing mitigates them.
      • Be able to discuss external validity concerns in the context of WEIRD samples and cross-cultural generalizability.
    • References to course materials mentioned in the lecture

      • Aaron textbook, Chapter 1 (and Chapter 2 for the next session)
      • UQ Extend Module 4 (and Module 5 for next session)
    • Summary takeaway

      • Measuring psychological constructs requires careful operational definitions and awareness of scale properties.
      • Reliability and validity determine whether our measures can support credible conclusions.
      • Organizing and displaying data thoughtfully helps tell the right story and supports valid inferences for statistical testing.