Statistical Investigations: Key Elements, Distributions, and Significance
Statistical Investigations: Key Elements and Thinking
Introduction to Statistical Investigation
- Modern society requires evidence-based decision making, highlighting the importance of drawing valid inferences from data.
- This module emphasizes the key elements of a statistical investigation using recent research studies.
- Example: Coffee Study (Freedman, Park, Abnet, Hollenbeck, & Sinha, 2012)
- Found that men drinking at least cups of coffee daily had a lower chance of dying, and women had a lower chance, compared to those who drank none.
- Raises the question of whether individuals should increase their coffee habits.
- Illustrates that conducting and interpreting studies well is crucial for making informed decisions.
Learning Objectives
- Define the basic elements of a statistical investigation.
- Describe the role of p-values and confidence intervals in statistical inference.
- Explain the role of random sampling in generalizing conclusions from a sample to a population.
- Describe the role of random assignment in drawing cause-and-effect conclusions.
- Develop the ability to critique statistical studies.
Key Components of a Statistical Investigation
Statistical investigation is a multi-step process, where numerical analysis (or "crunching numbers") is only one part.
1. Planning the Study
- Start by formulating a testable research question.
- Decide on appropriate data collection methods.
- Example: Coffee Study Planning Questions
- What was the study duration?
- How many people were recruited, and by what method?
- From where were participants recruited?
- What were the participants' ages?
- What other variables (e.g., smoking habits, lifestyle) were recorded via questionnaires?
- Were changes made to participants' coffee habits during the study?
2. Examining the Data
- Determine appropriate ways to examine the collected data.
- Graphical Analysis
- Identify relevant graphs (e.g., histograms, scatter plots).
- Interpret what these graphs reveal about the data.
- Descriptive Statistics
- Calculate appropriate descriptive statistics (e.g., mean, median, standard deviation) to summarize relevant data aspects.
- Understand what these statistics reveal (e.g., central tendency, variability).
- Pattern Recognition
- Identify overall patterns within the data.
- Look for individual observations that deviate from the overall pattern (outliers) and consider what they might indicate.
- Example: Coffee Study Data Examination
- Did the proportions of reduced risk differ when comparing smokers to non-smokers?
- Reliability and Validity
- Assess if there is evidence for the reliability (consistency) and validity (accuracy) of the measurements and study design.
3. Inferring from the Data
- Utilize valid statistical methods to draw inferences that extend "beyond" the specific data collected to a larger population or process.
- Example: Coffee Study Inference
- Is the observed reduction in the risk of death something that could have occurred simply by chance, or is it statistically significant?
4. Drawing Conclusions
- Formulate conclusions based on the insights gained from the data analysis.
- Generalizability
- Identify to whom these conclusions apply (e.g., if the coffee study participants were older, healthy, city dwellers, do the conclusions apply to younger, less healthy, rural populations?).
- Cause-and-Effect
- Determine if a cause-and-effect conclusion can be drawn from the treatments (e.g., is coffee drinking definitively the cause of the decreased risk of death, or is it merely an association?).
Distributional Thinking
- When collecting data to answer a question, the first crucial step is to organize and examine the data meaningfully.
- Fundamental Principle of Statistics: Data vary.
- The pattern of this variation, known as the distribution, is critical to capture and understand.
- Often, a careful presentation of the data's distribution can address many research questions directly, sometimes without needing more sophisticated analyses.
- It can also highlight additional questions requiring further examination.
Example 1: Readability of Cancer Pamphlets (Short, Moriarty, & Cooley, 1995)
- Research Question: Are cancer pamphlets written at an appropriate reading level for cancer patients?
- Data Collected:
- Patients' reading levels: Displayed in frequency counts. E.g., patients had reading levels less than grade , and patients had reading levels greater than grade . The total number of patients was . (Data in Table 1).
- Pamphlet readability levels: Displayed in frequency counts. E.g., pamphlets were at grade level , and pamphlets were at grade level . The total number of pamphlets was . (Data in Table 1).
- Revelations of Statistical Thinking:
- Data Vary: Values of variables (e.g., patient reading level, pamphlet readability level) are not constant.
- Analyzing the Distribution: Examining the pattern of variation (the distribution of the variable) provides insights.
- Addressing the Research Question:
- Requires comparing the two distributions (patient reading levels vs. pamphlet readability levels).
- Naive Comparison: Focusing only on measures of center (like the median, which was th grade for both) is insufficient because it ignores the variability and overall shapes of the distributions.
- Comprehensive Approach: Comparing entire distributions, often visually with a graph (Figure 1), is more illuminating.
- Conclusion from Figure 1:
- The two distributions are visibly not well aligned.
- Glaring Discrepancy: A significant number of patients ( or approximately ) have reading levels below that of the most readable pamphlet.
- This implies these patients will need assistance to understand the information.
- This conclusion is derived from considering the distributions as a whole, not just specific measures like center or variability, and the graph offers a more immediate contrast than tables.
Statistical Significance
- Even when patterns are found in data, there is often inherent uncertainty.
- Sources of Uncertainty:
- Potential for measurement errors (e.g., body temperature can fluctuate by almost during the day).
- Observations may be a "snapshot" from a longer-term process.
- Data may come from a small subset of the population of interest.
- Core Question of Statistical Significance: How can we determine if patterns observed in a small dataset provide convincing evidence of a systematic phenomenon in the larger process or population, rather than just being due to chance?
Example 2: Infant Social Evaluation Study (Hamlin, Wynn, & Bloom, 2007)
- Investigation: Researchers explored whether pre-verbal -month-old infants take into account an individual's actions toward others when evaluating that individual as appealing or aversive.
- Study Component:
- Infants observed a "climber" character failing to get up a hill.
- They were then shown two scenarios alternately:
- A "helper" character pushes the climber to the top.
- A "hinderer" character pushes the climber back down the hill.
- After repeated viewings, infants were presented with two pieces of wood (representing the helper and hinderer characters) and asked to pick one to play with.
- Result: Of the infants who made a clear choice, chose to play with the helper toy.
- This result then begs the question of statistical significance: Is this out of preference a random occurrence, or does it represent a genuine inclination in infants to favor prosocial behavior?
Additional Context
- This chapter is an Open Access adaptation from the NOBA project.
- Queen's University Psychology Department, in collaboration with Queen's Student Academic Success Services (SASS), developed a "Three-Step Method" for student support, available at
https://sass.queensu.ca/psyc100/.