Notes on Phrenology, Cherry-Picking, and Statistical Reasoning

Critique of phrenology: the idea that head shape could reveal intellect is questioned as overly simplistic and not supported by clear, distinct meanings in measurements like skull dimensions.
Modern neuroscience relies on noisy data: even if you collect EEG data, you don’t automatically get clean, distinct interpretations; it requires heavy statistical work and sophisticated analysis.
Historical limitation: earlier methods (calipers, measuring the biggest skull dimension, front-to-back length, etc.) were imperfect and biased by conceptual assumptions.

Cherry-picking vs preregistration: cherry-picking is the unfair, post-hoc selection of analyses after seeing data; preregistering hypotheses and analysis plans before data collection helps prevent this and keeps the process fair.
A principled approach is not cherry picking: you declare your analysis plan ahead of time and then execute it, reducing bias.
The “story about responsible science”: describing a project as fair and preplanned is offered as a contrast to unprincipled data tinkering.
Beware of cooks’ tricks: without preregistration, you could switch hypotheses (e.g., blaming a skull feature, like the back vs. front length) to fit the data after the fact.

Anecdote about making up a variable after seeing results: redefining categories (e.g., what counts as a “true Frenchman”) to fit a desired conclusion leads to tautologies and invalid inferences.
Post-hoc rationalizations undermine scientific integrity when you say, after the fact, that a specific label (e.g., identity group) explains a result while ignoring counterexamples.
A counterexample can always be produced if you only require results to fit a preconceived narrative, highlighting the need for pre-specified criteria.

If you survey a classroom, you’ll get typical averages for many traits (e.g., favorite colors like red, then blue) with some individuals deviating.
Across a broader population, you’ll find correlations that seem meaningful but may be spurious (e.g., more people from a certain background in a class), with no causal link to the trait of interest.
The key warning: many correlations exist that are statistically detectable, but they do not imply scientifically valuable relationships.
The danger described as an “indulgent, self-flattering scientific maneuver.”

The ethical move is to acknowledge mistakes: be explicit about being wrong when evidence doesn’t support your claim.
Recounting an embarrassing moment from historical research emphasizes the importance of humility and self-correction in science.

Early researchers attempted to map brain properties to race or intellect, sometimes using harmful labeling and categorization (e.g., sorting skulls by race; labeling skulls 1–50; separating children into age-based groups for remediation).
Proposed (hypothetical) classroom experiment: divide kids into age groups and place them in different classrooms for remediation, not because of inherent ability, but to optimize learning outcomes; this illustrates how seemingly benevolent organizational strategies can be misused to justify biased conclusions.
This reflects a broader caution: aging or racialized group labels were historically used to sort people, often with racist or discriminatory agendas.

The central critique: you cannot capture the full complexity of intelligence with a single number or metric; deciding what to measure and what to ignore shapes conclusions.
The example contrasts Einstein’s exceptional intellect with a person who bullies others: one cannot be fully captured by a single number like “IQ.”
The claim that intelligence cannot be fully reduced to a single value is highlighted as a fundamental challenge to simplistic numeric representations.

Correlation coefficient (r): measures linear association between two variables.
- r = \frac{\operatorname{cov}(X,Y)}{\sigmaX \sigmaY}
- Note: correlation does not imply causation; a nonzero r can occur by chance in large samples.
t-test (comparing two means) for independent samples (example form):
- t = \frac{\overline{X}1 - \overline{X}2}{sp \sqrt{ \frac{1}{n1} + \frac{1}{n_2} }}
- Pooled standard deviation: sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}
Analysis of Variance (ANOVA) for comparing more than two groups (conceptual):
- F-statistic: F = \frac{MS{between}}{MS{within}}
- Used to detect whether at least one group mean differs from the others.
Sample size reference: n = 50 (a hypothetical example indicating a sample large enough to distinguish differences for some analyses).
General caution: when exploring data, small samples can fail to distinguish groups; large samples can reveal statistically significant but practically meaningless differences.

Emphasis on preregistration connects to foundational principles of scientific methodology and replicability.
The critique of cherry-picking aligns with efforts to improve research transparency, reduce bias, and avoid data-dredging practices.
Historical misuses (phrenology, racialized skull measurements) illustrate how scientific methods can be misapplied to justify social biases; underscore the ethical responsibilities of researchers.
The discussion of IQ and intelligence encourages nuanced views of cognitive ability, recognizing multiple intelligences and limitations of single-number summaries.

Ethical: Avoid pseudoscience that dehumanizes or categorizes people based on biased measurements.
Philosophical: The nature of intelligence and the limits of reductionism; caution against conflating correlation with causation.
Practical: Design research with preregistration, transparent methods, and explicit criteria to guard against post-hoc rationalizations.
Societal relevance: Understanding how biases enter research can improve education, policy, and fairness in real-world applications.

Be wary of simplistic interpretations of physical measurements as proxies for complex traits like intelligence.
Preregister hypotheses and analysis plans to reduce cherry-picking and improve fairness.
Recognize that correlations can arise by chance and do not imply meaningful causal relationships.
Historical misuse of scientific methods highlights the importance of ethics and critical thinking in research design.
Accept that intelligence cannot be fully captured by a single metric; humility and openness to revision are central to scientific progress.

The “true Frenchman” tautology example: any post-hoc definition to fit a claim collapses under counterexamples.
The classroom color-preferences example: general patterns exist, but outliers or group-level correlations can arise without implying a real substantive link.
The remediation-group idea: even well-intentioned organizational decisions can be co-opted to support biased or discriminatory conclusions if not carefully designed and justified.

If you were to design a modern study addressing brain measures and cognitive traits, how would you ensure preregistration, guard against cherry-picking, and interpret results without overclaiming?