Statistical Thinking: Key Concepts and Inference in Statistical Investigation

Learning Objectives

  • Define basic elements of a statistical investigation.

  • Describe the role of p-values and confidence intervals in statistical inference.

  • Describe the role of random sampling in generalizing conclusions from a sample to a population.

  • Describe the role of random assignment in drawing cause-and-effect conclusions.

  • Critique statistical studies.

Introduction to Statistical Thinking (Four Studies Emphasized)

  • Society increasingly relies on evidence-based decision making; statistics helps draw valid inferences from data.

  • The module uses four recent research studies to highlight key elements of a statistical investigation.

  • Emphasis on planning, data examination, inference, and drawing conclusions beyond the observed data.

  • Example discussed: coffee consumption and life expectancy from Freedman et al. (2012).

  • Takeaway: Do not rely on anecdote or intuition; use systematic statistical thinking to gain insight from data.

  • Real-world relevance: data are ubiquitous; statistics guides interpretation for decisions and policies.

The Three-Step Method (context for learning)

  • Step 1: Plan the study (develop a testable question and data-collection plan).

  • Step 2: Examine the data (select appropriate graphs, descriptive statistics, patterns, and variability).

  • Step 3: Infer from the data (assess whether observed patterns could be due to random variation; generalize beyond the sample and consider potential causal interpretations when applicable).

  • The method helps organize thinking about how to answer research questions and assess the strength of conclusions.

Elements of a Statistical Investigation

  • Planning the study

    • Formulate a testable research question.

    • Decide how data will be collected (sampling method, measurements, variables collected).

    • Consider study design details: how long the study lasts, recruitment methods, participant demographics (age, smoking, etc.), and any changes imposed (e.g., coffee habit changes).

  • Examining the data

    • Choose appropriate graphs and descriptive statistics to summarize relevant aspects.

    • Look for patterns, variability, reliability, and validity.

    • Compare distributions (e.g., smoker vs. non-smoker groups) rather than relying solely on centers (means/medians).

  • Inferring from the data

    • Apply valid statistical methods to draw inferences beyond the observed data.

    • Assess whether observed effects (e.g., a 10%–15% reduction in risk) could occur by chance alone.

  • Drawing conclusions

    • Determine to whom conclusions apply (external validity: who are the people in the study? ages, health status, location).

    • Consider whether the study supports a cause-and-effect conclusion about treatments or exposures.

    • Recognize that numerical analysis is only one part of the investigation; interpretation and context are crucial.

Distributional Thinking

  • Data vary, and the pattern of variation is crucial to understanding phenomena.

  • Presenting data carefully (distributions) can answer questions and reveal further questions without resorting to overly simplistic summaries.

  • Comparing only centers (e.g., medians) can be misleading; the full distribution provides more insight.

  • Example: cancer pamphlets vs. patient reading levels

    • 63 patients assessed for reading ability; 30 pamphlets assessed for readability (variables: patient reading level, pamphlet readability).

    • Distributions reveal misalignment: many patients have reading levels below the most readable pamphlet (e.g., 17/63 = 27%).

    • Figure comparing distributions shows that medians alone miss important variation.

  • Measurements can have uncertainty due to measurement error, snapshot sampling, or small sample size.

  • Assessment of evidence requires looking at distributional patterns and variability, not just central tendency.

Statistical Significance: Assessing Random Variation

  • Example: Hamlin, Wynn, & Bloom (2007) infants study on helping vs. hindering agents

    • 16 infants, 14 chose the helper toy after exposure to helper/hinderer scenarios.

    • Consider alternative explanations (toy color, shapes, handedness, position) and how they were controlled (rotation of conditions to balance potential effects).

    • Acknowledges random variation: could result from chance.

  • Probability model for the observed result under the null hypothesis of no preference

    • If each infant is equally likely to choose either toy, each trial is a Bernoulli with probability p = 0.5 for choosing the helper.

    • What is the probability of observing 14 or more helpers in 16 trials?

    • Computed p-value: P(X14)=0.0021P(X \ge 14) = 0.0021 under the null model.

  • P-value concept

    • The p-value tells how often a random process would yield a result as extreme or more extreme than what was observed, assuming random chance is the only factor.

    • If the p-value is smaller than the chosen significance level, typically α=0.05,\alpha = 0.05, we reject the null hypothesis of random chance.

  • Decision rule example

    • With p-value = 0.0021 < 0.05, conclude strong evidence of a genuine preference for the helper toy.

  • Generalizability (external validity) begins here: larger or more representative samples improve generalizability.

Generalizability and Sampling

  • Generalizability: results from widely representative samples are more likely to generalize to the population.

  • Limitation: conclusions from a study apply to the specific sample (e.g., the 16 infants) unless sampling is representative.

  • Random sampling is a key method to generalize findings to a larger population.

  • How to sample

    • Simple form: number each member of the population and randomly select a subset.

    • Many real polls use probability-based sampling methods to obtain nationally representative panels.

  • Example: General Social Survey (GSS)

    • Based on a sample of about 2,000 adult Americans.

    • Used to infer population proportions on issues like self-identification as liberal, happiness, and feeling rushed.

  • Margin of error and confidence

    • A probability sample yields a margin of error: typically approximated by ME1n\text{ME} \approx \frac{1}{\sqrt{n}} (for large populations and simple random samples).

    • Example: 2004 GSS reported 83.6% feeling rushed (817/977 respondents).

    • 95% confidence that the true population value lies within ± ME of the sample percentage; here, ME ≈ 3 percentage points (since 19770.0323%\frac{1}{\sqrt{977}} \approx 0.032\approx 3\%).

  • Non-random samples can introduce bias by systematically over- or under-representing segments of the population.

  • Other sources of error (e.g., dishonest responses) are not captured by the margin of error.

Cause and Effect and Random Assignment

  • Distinguishing between group differences due to treatment vs. group-formation processes.

  • Random assignment helps balance both known and unknown variables across groups, making causal conclusions more plausible.

  • Example 4: intrinsic vs. extrinsic motivation and creativity (Ramsey & Schafer, 2002; Amabile, 1985)

    • 47 experienced creative writers were assigned to intrinsic or extrinsic motivation groups via random assignment.

    • Observed means: intrinsic = 19.88, extrinsic = 15.74; suggests higher creativity under intrinsic motivation.

    • However, variability within groups matters; distributions overlap substantially (Figure 2).

    • Standard deviations: extrinsic SD = 5.25; intrinsic SD = 4.40.

    • Because means differ but not enormously, random assignment is crucial to isolate the treatment effect.

  • What random assignment accomplishes

    • Tends to balance all variables (known and unknown) across groups, making differences more attributable to the treatment.

    • A potential unlucky draw could still exist; we quantify this with a p-value under the null that the treatment has no effect.

  • How to test the assignment effect without assuming different populations

    • Treat the observed scores as if the same person’s score would be the same regardless of group, and simulate random reassignment many times.

    • Example: 1,000 hypothetical random assignments; observed difference = 4.14 points (19.88 − 15.74).

    • Only 2 of 1,000 simulated random assignments produced a difference as large or larger than 4.41 (they used a different number for the simulated difference in the text), giving an approximate p-value of 21000=0.002$.</p></li><li><p>Result:veryunlikelythattheobserveddifferencearosebychanceduetorandomassignmentalone;supportsacausalinterpretationthatintrinsicmotivationincreasescreativityscoresinthissample.</p></li></ul></li><li><p>Cautionongeneralizationfromrandomizedexperiments</p><ul><li><p>Generalizecautiouslytoindividualssimilartothoseinthestudy(extensivecreativewritingexperience).</p></li><li><p>Weneedmoreinformationaboutthesamplingprocesstogeneralizetobroaderpopulations.</p></li></ul></li></ul><h3id="8adc2d3cb0384688a0313e7586168b93"datatocid="8adc2d3cb0384688a0313e7586168b93"collapsed="false"seolevelmigrated="true">TheImportanceofDiversityinPsychologicalScience</h3><ul><li><p>Diversityconsiderationsgobeyondsex/genderdichotomies;recognizingrace,age,geography,socioeconomicstatus,andmore.</p></li><li><p>Thefieldhashistoricallyusedbinarygendercategories,whichmayfailtocapturethediversityofidentities.</p></li><li><p>Diversityandinclusionarecentralthemesthatinfluenceinterpretationandgeneralizabilityofresearchfindings.</p></li><li><p>Thecoursenotesthatgender,sex,andrelatedtopicswillbeaddressedinlaterunits,highlightingtheneedtoexaminethesetopicscarefully.</p></li><li><p>Emphasisonaskingquestionsabouthowrepresentativethesampleisandhowfindingsmaygeneralizeacrossdiversepopulations.</p></li></ul><h3id="16e8fbd1c6284db9807a461a1b62eeb0"datatocid="16e8fbd1c6284db9807a461a1b62eeb0"collapsed="false"seolevelmigrated="true">TheScientificMethodandtheRoleofRandomnessinInference</h3><ul><li><p>Thescientificmethodinpsychologyinvolves:hypothesizedesignastudyconductthestudyanalyzethedatareportresults.</p></li><li><p>Statisticalthinkingrequirescarefulstudydesign,patternanalysis,andconclusionsthatgobeyondtheobserveddata.</p></li><li><p>Randomsamplingisessentialforgeneralizingresultstoapopulation;randomassignmentisessentialforcausalconclusions.</p></li><li><p>Probabilitymodelshelpquantifyhowmuchrandomvariationtoexpectandtodetermineifobservedresultscouldoccurbychance.</p></li><li><p>Marginoferrorandconfidencelevelsprovideaframeworktoexpressuncertaintyinestimates.</p></li></ul><h3id="d82e82a53b724eeab3faadd89c9b52d6"datatocid="d82e82a53b724eeab3faadd89c9b52d6"collapsed="false"seolevelmigrated="true">TheCoffeeStudyCase(LongRunEvidenceandCautions)</h3><ul><li><p>Thediscussedcoffeestudy(Freedmanetal.,2012)isalarge,14yearobservationalstudypublishedinamajorjournal(NewEnglandJournalofMedicine).</p></li><li><p>Studydesignandscope</p><ul><li><p>Morethan402,000peopleaged5071fromsixstatesandtwometropolitanareas.</p></li><li><p>Excludedindividualswithcancer,heartdisease,orstrokeatbaseline.</p></li><li><p>Coffeeconsumptionassessedonceatbaseline.</p></li></ul></li><li><p>Keyfindings</p><ul><li><p>About52,000deathsoccurredduringfollowup.</p></li><li><p>Highercoffeeconsumptionassociatedwithlowerdeathrisk;reductionsmorepronouncedforthosedrinkingsixormorecupsdaily.</p></li><li><p>Nocleardifferencebetweencaffeinatedvs.decaffeinatedcoffeeeffects.</p></li></ul></li><li><p>Importantinterpretationcautions</p><ul><li><p>Thiswasanobservationalstudy;therefore,nocausalconclusionscanbedrawnaboutcoffeecausingincreasedlongevity.</p></li><li><p>Possibleconfoundingfactors:peoplewithchronicdiseasesmightavoidcoffee,amongotherpotentialconfounders.</p></li><li><p>Resultsshouldbereviewedinthecontextofsimilarstudiesandacrossstudydesignstoassessconsistencyandplausibility.</p></li><li><p>Statisticaladjustmentcanaddresssomeconfounders,butnotall;residualconfoundingremainsaconcern.</p></li></ul></li><li><p>Implicationsforpolicyanddecisionmaking</p><ul><li><p>Observationalfindingscaninformhypothesesandguidefuturefocusedstudies,includingrandomizedexperimentswherefeasible.</p></li></ul></li></ul><h3id="05cb70494cfe42c3be8baa379178433b"datatocid="05cb70494cfe42c3be8baa379178433b"collapsed="false"seolevelmigrated="true">SummaryandPracticalTakeaways</h3><ul><li><p>Astatisticalinvestigationcomprisesplanning,dataexamination,inference,anddrawingcautiousconclusionsaboutpopulationsandcausalrelationships.</p></li><li><p>Distributionalthinkingemphasizesexaminingfulldatadistributions,notjustcenters,toavoidmisleadingconclusions.</p></li><li><p>Pvaluesquantifyhowunlikelyobservedresultsareunderanullhypothesis;smallpvaluessuggestrejectingrandomchanceasanexplanation,givenachosensignificancelevel\frac{2}{1000} = 0.002\$.</p></li><li><p>Result: very unlikely that the observed difference arose by chance due to random assignment alone; supports a causal interpretation that intrinsic motivation increases creativity scores in this sample.</p></li></ul></li><li><p>Caution on generalization from randomized experiments</p><ul><li><p>Generalize cautiously to individuals similar to those in the study (extensive creative writing experience).</p></li><li><p>We need more information about the sampling process to generalize to broader populations.</p></li></ul></li></ul><h3 id="8adc2d3c-b038-4688-a031-3e7586168b93" data-toc-id="8adc2d3c-b038-4688-a031-3e7586168b93" collapsed="false" seolevelmigrated="true">The Importance of Diversity in Psychological Science</h3><ul><li><p>Diversity considerations go beyond sex/gender dichotomies; recognizing race, age, geography, socioeconomic status, and more.</p></li><li><p>The field has historically used binary gender categories, which may fail to capture the diversity of identities.</p></li><li><p>Diversity and inclusion are central themes that influence interpretation and generalizability of research findings.</p></li><li><p>The course notes that gender, sex, and related topics will be addressed in later units, highlighting the need to examine these topics carefully.</p></li><li><p>Emphasis on asking questions about how representative the sample is and how findings may generalize across diverse populations.</p></li></ul><h3 id="16e8fbd1-c628-4db9-807a-461a1b62eeb0" data-toc-id="16e8fbd1-c628-4db9-807a-461a1b62eeb0" collapsed="false" seolevelmigrated="true">The Scientific Method and the Role of Randomness in Inference</h3><ul><li><p>The scientific method in psychology involves: hypothesize → design a study → conduct the study → analyze the data → report results.</p></li><li><p>Statistical thinking requires careful study design, pattern analysis, and conclusions that go beyond the observed data.</p></li><li><p>Random sampling is essential for generalizing results to a population; random assignment is essential for causal conclusions.</p></li><li><p>Probability models help quantify how much random variation to expect and to determine if observed results could occur by chance.</p></li><li><p>Margin of error and confidence levels provide a framework to express uncertainty in estimates.</p></li></ul><h3 id="d82e82a5-3b72-4eea-b3fa-add89c9b52d6" data-toc-id="d82e82a5-3b72-4eea-b3fa-add89c9b52d6" collapsed="false" seolevelmigrated="true">The Coffee Study Case (Long-Run Evidence and Cautions)</h3><ul><li><p>The discussed coffee study (Freedman et al., 2012) is a large, 14-year observational study published in a major journal (New England Journal of Medicine).</p></li><li><p>Study design and scope</p><ul><li><p>More than 402,000 people aged 50–71 from six states and two metropolitan areas.</p></li><li><p>Excluded individuals with cancer, heart disease, or stroke at baseline.</p></li><li><p>Coffee consumption assessed once at baseline.</p></li></ul></li><li><p>Key findings</p><ul><li><p>About 52,000 deaths occurred during follow-up.</p></li><li><p>Higher coffee consumption associated with lower death risk; reductions more pronounced for those drinking six or more cups daily.</p></li><li><p>No clear difference between caffeinated vs. decaffeinated coffee effects.</p></li></ul></li><li><p>Important interpretation cautions</p><ul><li><p>This was an observational study; therefore, no causal conclusions can be drawn about coffee causing increased longevity.</p></li><li><p>Possible confounding factors: people with chronic diseases might avoid coffee, among other potential confounders.</p></li><li><p>Results should be reviewed in the context of similar studies and across study designs to assess consistency and plausibility.</p></li><li><p>Statistical adjustment can address some confounders, but not all; residual confounding remains a concern.</p></li></ul></li><li><p>Implications for policy and decision making</p><ul><li><p>Observational findings can inform hypotheses and guide future focused studies, including randomized experiments where feasible.</p></li></ul></li></ul><h3 id="05cb7049-4cfe-42c3-be8b-aa379178433b" data-toc-id="05cb7049-4cfe-42c3-be8b-aa379178433b" collapsed="false" seolevelmigrated="true">Summary and Practical Takeaways</h3><ul><li><p>A statistical investigation comprises planning, data examination, inference, and drawing cautious conclusions about populations and causal relationships.</p></li><li><p>Distributional thinking emphasizes examining full data distributions, not just centers, to avoid misleading conclusions.</p></li><li><p>P-values quantify how unlikely observed results are under a null hypothesis; small p-values suggest rejecting random chance as an explanation, given a chosen significance level\alpha\approx 0.05.</p></li><li><p>Randomsamplingsupportsgeneralizabilitytoapopulation;marginoferrorquantifiestheexpectedrangeofvariationduetosamplingrandomness,withapproximateformula.</p></li><li><p>Random sampling supports generalizability to a population; margin of error quantifies the expected range of variation due to sampling randomness, with approximate formula\text{ME} \approx \frac{1}{\sqrt{n}}forproportionsinlargesamples.</p></li><li><p>Randomassignmentsupportscausalinterpretationsbybalancingconfoundingvariablesacrossgroups;observeddifferencesinoutcomesunderrandomizationrequireexaminationofhowoftensuchdifferenceswouldoccurbychance(pvaluefrompermutationorsimulationtests).</p></li><li><p>Diversityandinclusivityareessentialfortheexternalvalidityofpsychologicalscience;findingsmaynotgeneralizeacrossallpopulationsifsampleslackrepresentativeness.</p></li><li><p>Ininterpretingstudies,distinguishbetweenevidenceofassociation(observational)andevidenceofcausation(randomizedexperiments),whileconsideringthebroaderliteratureandmethodologicallimitations.</p></li></ul><h3id="0677640e574c495a97255defc4b1bf95"datatocid="0677640e574c495a97255defc4b1bf95"collapsed="false"seolevelmigrated="true">KeyDefinitionsandConcepts(glossary)</h3><ul><li><p>Population:theentiregroupofinterestfromwhichasampleisdrawn.</p></li><li><p>Sample:asubsetofthepopulationselectedforstudy.</p></li><li><p>Randomsampling:asamplingmethodwhereeverymemberofthepopulationhasanequalchanceofbeingchosen;facilitatesgeneralizabilityandcorrectsforsamplingbias.</p></li><li><p>Marginoferror(ME):therangewithinwhichthesamplestatisticisexpectedtofallfromthepopulationparameterinrepeatedsampling;forproportions,approximatedbyfor proportions in large samples.</p></li><li><p>Random assignment supports causal interpretations by balancing confounding variables across groups; observed differences in outcomes under randomization require examination of how often such differences would occur by chance (p-value from permutation or simulation tests).</p></li><li><p>Diversity and inclusivity are essential for the external validity of psychological science; findings may not generalize across all populations if samples lack representativeness.</p></li><li><p>In interpreting studies, distinguish between evidence of association (observational) and evidence of causation (randomized experiments), while considering the broader literature and methodological limitations.</p></li></ul><h3 id="0677640e-574c-495a-9725-5defc4b1bf95" data-toc-id="0677640e-574c-495a-9725-5defc4b1bf95" collapsed="false" seolevelmigrated="true">Key Definitions and Concepts (glossary)</h3><ul><li><p>Population: the entire group of interest from which a sample is drawn.</p></li><li><p>Sample: a subset of the population selected for study.</p></li><li><p>Random sampling: a sampling method where every member of the population has an equal chance of being chosen; facilitates generalizability and corrects for sampling bias.</p></li><li><p>Margin of error (ME): the range within which the sample statistic is expected to fall from the population parameter in repeated sampling; for proportions, approximated by\text{ME} \approx \frac{1}{\sqrt{n}}\,\text{(in proportion terms)}.</p></li><li><p>Confidencelevel:theprobabilitythatthemarginoferroractuallycontainsthepopulationparameterinrepeatedsampling(e.g.,95.</p></li><li><p>Confidence level: the probability that the margin of error actually contains the population parameter in repeated sampling (e.g., 95%).</p></li><li><p>P-value: the probability, under the null hypothesis, of obtaining a result as extreme or more extreme than the observed one.</p></li><li><p>Level of significance (alpha): the threshold for deciding whether to reject the null hypothesis (commonly\alpha = 0.05).</p></li><li><p>Randomassignment:allocatingparticipantstogroupsbychancetoensureequivalenceofgroupsonaverage.</p></li><li><p>Observationalstudy:astudywheretheresearcherobservesvariableswithoutmanipulatingthestudyenvironment;canshowassociationsbutnotcausation.</p></li><li><p>Causationvs.association:causationimpliestheexposuredirectlychangestheoutcome;associationindicatesarelationshipbutnotnecessarilyacausallink.</p></li><li><p>Bias:systematicerrorthatleadstoincorrectconclusionsduetothesamplingmethod,measurement,orotherprocesses.</p></li><li><p>Variability/distribution:howdatapointsspreadaroundacentraltendency;understandingdistributionisessentialforinterpretingpatterns.</p></li></ul><h3id="ba798c164c584f448445ba711b256a1d"datatocid="ba798c164c584f448445ba711b256a1d"collapsed="false"seolevelmigrated="true">NotableNumericalReferencesandEquations(LaTeX)</h3><ul><li><p>Probabilityofobserving14ormoreheadsin16Bernoullitrialswithp=0.5underthenull:).</p></li><li><p>Random assignment: allocating participants to groups by chance to ensure equivalence of groups on average.</p></li><li><p>Observational study: a study where the researcher observes variables without manipulating the study environment; can show associations but not causation.</p></li><li><p>Causation vs. association: causation implies the exposure directly changes the outcome; association indicates a relationship but not necessarily a causal link.</p></li><li><p>Bias: systematic error that leads to incorrect conclusions due to the sampling method, measurement, or other processes.</p></li><li><p>Variability/distribution: how data points spread around a central tendency; understanding distribution is essential for interpreting patterns.</p></li></ul><h3 id="ba798c16-4c58-4f44-8445-ba711b256a1d" data-toc-id="ba798c16-4c58-4f44-8445-ba711b256a1d" collapsed="false" seolevelmigrated="true">Notable Numerical References and Equations (LaTeX)</h3><ul><li><p>Probability of observing 14 or more heads in 16 Bernoulli trials with p = 0.5 under the null:P(X\ge 14) = 0.0021</p></li><li><p>DifferenceinmeansinExample4:</p></li><li><p>Difference in means in Example 4:\Delta = \bar{x}{\text{intrinsic}} - \bar{x}{\text{extrinsic}} = 19.88 - 15.74 = 4.14</p></li><li><p>Reportedstandarddeviationsincreativityscores:extrinsic</p></li><li><p>Reported standard deviations in creativity scores: extrinsic\sigma{E} = 5.25,intrinsic, intrinsic\sigma{I} = 4.40</p></li><li><p>Observedmeandifference:</p></li><li><p>Observed mean difference:\Delta = 4.14(asabove)</p></li><li><p>Largescalecoffeestudydetails:samplesizeover402,000;agerange5071;followupduration14years;numberofdeaths 52,000;yearsandgroupsnotbrokendownbeyondcoffeeintakecategories.</p></li><li><p>MarginoferrorexampleforGSS(2004):around(as above)</p></li><li><p>Large-scale coffee study details: sample size over 402{,}000; age range 50–71; follow-up duration 14 years; number of deaths ~52{,}000; years and groups not broken down beyond coffee intake categories.</p></li><li><p>Margin of error example for GSS (2004): around\pm 3\%with95with 95% confidence, given sample size around 977; margin approximates\text{ME} \approx \frac{1}{\sqrt{977}} \approx 0.032 \approx 3\%.</p></li><li><p>Causalinferenceviarandomassignment:probabilitymodelandsimulationsshowthatobserveddifferencesinmeansunderrandomassignmentareunlikelytooccurbychancealone(examplepvalue.</p></li><li><p>Causal inference via random assignment: probability model and simulations show that observed differences in means under random assignment are unlikely to occur by chance alone (example p-value ≈0.002\$).

    // End of notes