Notes on Phrenology, Cherry-Picking, and Statistical Reasoning

Phrenology, Skull Measurement, and the Problems of Data Interpretation

  • Critique of phrenology: the idea that head shape could reveal intellect is questioned as overly simplistic and not supported by clear, distinct meanings in measurements like skull dimensions.

  • Modern neuroscience relies on noisy data: even if you collect EEG data, you don’t automatically get clean, distinct interpretations; it requires heavy statistical work and sophisticated analysis.

  • Historical limitation: earlier methods (calipers, measuring the biggest skull dimension, front-to-back length, etc.) were imperfect and biased by conceptual assumptions.

Cherry-Picking, preregistration, and principled research

  • Cherry-picking vs preregistration: cherry-picking is the unfair, post-hoc selection of analyses after seeing data; preregistering hypotheses and analysis plans before data collection helps prevent this and keeps the process fair.

  • A principled approach is not cherry picking: you declare your analysis plan ahead of time and then execute it, reducing bias.

  • The “story about responsible science”: describing a project as fair and preplanned is offered as a contrast to unprincipled data tinkering.

  • Beware of cooks’ tricks: without preregistration, you could switch hypotheses (e.g., blaming a skull feature, like the back vs. front length) to fit the data after the fact.

The dangers of post-hoc reasoning in statistics

  • Anecdote about making up a variable after seeing results: redefining categories (e.g., what counts as a “true Frenchman”) to fit a desired conclusion leads to tautologies and invalid inferences.

  • Post-hoc rationalizations undermine scientific integrity when you say, after the fact, that a specific label (e.g., identity group) explains a result while ignoring counterexamples.

  • A counterexample can always be produced if you only require results to fit a preconceived narrative, highlighting the need for pre-specified criteria.

Sampling, correlations, and the limits of inference

  • If you survey a classroom, you’ll get typical averages for many traits (e.g., favorite colors like red, then blue) with some individuals deviating.

  • Across a broader population, you’ll find correlations that seem meaningful but may be spurious (e.g., more people from a certain background in a class), with no causal link to the trait of interest.

  • The key warning: many correlations exist that are statistically detectable, but they do not imply scientifically valuable relationships.

  • The danger described as an “indulgent, self-flattering scientific maneuver.”

Honest scientific practice and admitting error

  • The ethical move is to acknowledge mistakes: be explicit about being wrong when evidence doesn’t support your claim.

  • Recounting an embarrassing moment from historical research emphasizes the importance of humility and self-correction in science.

Brain measurement history and its ethical implications

  • Early researchers attempted to map brain properties to race or intellect, sometimes using harmful labeling and categorization (e.g., sorting skulls by race; labeling skulls 1–50; separating children into age-based groups for remediation).

  • Proposed (hypothetical) classroom experiment: divide kids into age groups and place them in different classrooms for remediation, not because of inherent ability, but to optimize learning outcomes; this illustrates how seemingly benevolent organizational strategies can be misused to justify biased conclusions.

  • This reflects a broader caution: aging or racialized group labels were historically used to sort people, often with racist or discriminatory agendas.

The irony of measuring intelligence numerically

  • The central critique: you cannot capture the full complexity of intelligence with a single number or metric; deciding what to measure and what to ignore shapes conclusions.

  • The example contrasts Einstein’s exceptional intellect with a person who bullies others: one cannot be fully captured by a single number like “IQ.”

  • The claim that intelligence cannot be fully reduced to a single value is highlighted as a fundamental challenge to simplistic numeric representations.

Key statistical concepts referenced (formulas and ideas)

  • Correlation coefficient (r): measures linear association between two variables.

    • r = \frac{\operatorname{cov}(X,Y)}{\sigmaX \sigmaY}

    • Note: correlation does not imply causation; a nonzero r can occur by chance in large samples.

  • t-test (comparing two means) for independent samples (example form):

    • t = \frac{\overline{X}1 - \overline{X}2}{sp \sqrt{ \frac{1}{n1} + \frac{1}{n_2} }}

    • Pooled standard deviation: sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}

  • Analysis of Variance (ANOVA) for comparing more than two groups (conceptual):

    • F-statistic: F = \frac{MS{between}}{MS{within}}

    • Used to detect whether at least one group mean differs from the others.

  • Sample size reference: n = 50 (a hypothetical example indicating a sample large enough to distinguish differences for some analyses).

  • General caution: when exploring data, small samples can fail to distinguish groups; large samples can reveal statistically significant but practically meaningless differences.

Connections to broader themes

  • Emphasis on preregistration connects to foundational principles of scientific methodology and replicability.

  • The critique of cherry-picking aligns with efforts to improve research transparency, reduce bias, and avoid data-dredging practices.

  • Historical misuses (phrenology, racialized skull measurements) illustrate how scientific methods can be misapplied to justify social biases; underscore the ethical responsibilities of researchers.

  • The discussion of IQ and intelligence encourages nuanced views of cognitive ability, recognizing multiple intelligences and limitations of single-number summaries.

Ethical, philosophical, and practical implications

  • Ethical: Avoid pseudoscience that dehumanizes or categorizes people based on biased measurements.

  • Philosophical: The nature of intelligence and the limits of reductionism; caution against conflating correlation with causation.

  • Practical: Design research with preregistration, transparent methods, and explicit criteria to guard against post-hoc rationalizations.

  • Societal relevance: Understanding how biases enter research can improve education, policy, and fairness in real-world applications.

Summary takeaways

  • Be wary of simplistic interpretations of physical measurements as proxies for complex traits like intelligence.

  • Preregister hypotheses and analysis plans to reduce cherry-picking and improve fairness.

  • Recognize that correlations can arise by chance and do not imply meaningful causal relationships.

  • Historical misuse of scientific methods highlights the importance of ethics and critical thinking in research design.

  • Accept that intelligence cannot be fully captured by a single metric; humility and openness to revision are central to scientific progress.

Notable examples to remember

  • The “true Frenchman” tautology example: any post-hoc definition to fit a claim collapses under counterexamples.

  • The classroom color-preferences example: general patterns exist, but outliers or group-level correlations can arise without implying a real substantive link.

  • The remediation-group idea: even well-intentioned organizational decisions can be co-opted to support biased or discriminatory conclusions if not carefully designed and justified.

Final reflection prompt

  • If you were to design a modern study addressing brain measures and cognitive traits, how would you ensure preregistration, guard against cherry-picking, and interpret results without overclaiming?