Data analysis planning and reproducibility in science we start with planning how to analyze data rather than throwing together analyses based on
Data analysis planning and reproducibility in science
We start with planning how to analyze data rather than throwing together analyses based on preferences; documentation and writing are essential for clarity.
Experimentation involves manipulating one or two specific factors in a situation and reporting carefully how other factors change.
Reproducible data are central: scientists aim to communicate results and, crucially, how those results were obtained so others can double-check the work.
Replication is the process of applying the original methodology (or a close variation) to see if the same results emerge, reinforcing the validity of conclusions.
Papers typically include sections detailing the questions asked, data sources, and the steps undertaken so others can reproduce the study.
Science is a deeply collaborative enterprise across disciplines, with progress built on the cumulative work of many researchers rather than single “great minds.”
The scientific method in psychology mirrors other sciences: theory → hypothesis → data → analysis → interpretation.
A theory is a broad, well-supported explanation backed by extensive observation; it’s more rigorous than colloquial uses of the word "theory".
From a theory, we derive a hypothesis to test in specific situations.
Data are collected, analyzed, and interpreted to evaluate the hypothesis; inductive reasoning underpins this process, based on samples.
This lecture outlines a sequence: samples, inductive reasoning, hypotheses, data collection, data analysis, and interpretation.
The scientific method: theory, hypothesis, data, analysis, interpretation
Theory: overarching idea supported by extensive testing and observation; aims to say something broad about a phenomenon.
Hypothesis: a specific, testable prediction about the relationship between two or more variables.
Data: collected to test hypotheses; can be quantitative (numbers) or qualitative (narratives).
Analysis and interpretation: researchers analyze data to draw conclusions about whether the hypothesis is supported or not.
Inductive reasoning: reasoning from specific observations or samples to general conclusions about a population; results are probabilistic and come with uncertainty.
Distinction from deductive reasoning: deductive reasoning starts with a general rule to deduce specifics; inductive reasoning generalizes from specifics to broader claims.
Example of inductive inference: predicting a Syracuse winter will have snow based on a subset of observed winters; all such in-sample observations carry some uncertainty about future outcomes.
Uncertainty and margins of error are inherent in inductive conclusions; even strong historical patterns do not guarantee future events.
Researchers often report a margin of error or probability indicating how likely it is that the observed effect reflects a true relationship rather than random variation.
Samples, populations, and representativeness
When asking about human thoughts, emotions, or behavior, researchers aim to generalize to larger populations (e.g., humans, Americans, English speakers, students at Syracuse University).
It is generally impractical to survey or test everyone; instead, researchers study a sample.
Sample: a subset of the population that is studied to make inferences about the population. Example: asking about employment with a sample of 300 people rather than 2{,}000{,}000.
Population vs. sample: Population is the entire group of interest; sample is a smaller subset studied to draw conclusions.
Representativeness: a sample should have a demographic mix similar to the population of interest; otherwise, generalizations may be biased.
College students as a sample can differ from the general population in age, employment status, income, and other demographics.
Implications: findings from college student samples may not fully generalize to the broader population; researchers must consider how representative their sample is and discuss limitations.
Example concerns: younger age distribution among students affects questions about political views, health concerns, sleep, exercise, and diet; employment status can affect answers about work-related questions.
The census is one way governments collect data from a very large population, illustrating the scale needed for broad generalizations.
Inductive vs. deductive reasoning
Deductive reasoning (Sherlock Holmes style): start with broad observations or rules and deduce specific conclusions from them.
Inductive reasoning: start with specific observations (often from a sample) and infer general conclusions about the population.
In science, conclusions are drawn from sampled data to generalize about populations, with acknowledged uncertainty.
Example of inductive inference: “It will snow in Syracuse this winter” based on observed winters and lived experience; the conclusion is probabilistic, not certain.
Important caveat: conclusions are always conditional on the sample and data; they are educated guesses subject to revision with new information.
Hypotheses: testable and falsifiable
A hypothesis is a specific, testable prediction about the relationship between two or more variables that can be measured or categorized.
Measurable variables: anything that can be quantified or categorized (events, conditions, characteristics, behaviors).
Testable: the hypothesis must be measurable in a way that yields numerical data or categorical data.
Non-ideal hypotheses: statements about metaphysical or intangible notions are difficult to test and are poor scientific hypotheses.
Falsifiability: a hypothesis must have a defined failure state—clear criteria for what would count as disproving it.
Example: “There is a relationship between how late it is in the evening and how sleepy people report being.” A falsifying outcome would be finding no relationship between time of evening and sleepiness.
Some claims are difficult to falsify (e.g., certain conspiracy theories) because there is no agreed-upon way to disprove them; this is a mark of pseudoscience.
Distinguishing science from pseudoscience:
Scientific claims are testable, potentially falsifiable, and open to replication and scrutiny by others.
Pseudoscience often resists criticism, relies on anecdotes, and claims that cannot be disproven or updated with new data.
How to evaluate scientific claims:
Look for falsifiability and controlled testing (e.g., randomized controlled trials).
Compare claims that rely on controlled experiments versus anecdotes or selective data.
Be cautious of claims that rely on selective data or that cannot be tested or challenged.
Illustrative practice questions (testable and falsifiable or not):
The moon is made of green cheese: Testable and falsifiable via moon rocks and spectroscopy.
Depression caused by invisible gremlins in the brain: Not testable in a way that can be falsified; not a robust scientific claim.
Cupcakes are delicious: Not a scientific hypothesis (subjective, value-laden).
We should eat cupcakes: While related questions (health effects) can be studied, the statement itself is not a testable hypothesis.
Increases intelligence due to classical music: Testable and falsifiable via controlled comparisons of groups exposed to music vs. no music.
When constructing hypotheses, researchers start from a theory and derive a specific, testable statement about a particular situation that can be studied with measurable outcomes.
Descriptive, correlational, and experimental research
Descriptive research: describes “what’s going on” by asking people questions, collecting surveys, and summarizing data (e.g., average happiness on a scale).
Correlational research: investigates how one variable changes as another changes; assesses relationships but cannot establish causation.
Correlations can be positive (both variables increase together) or negative (one increases while the other decreases).
Used to predict how changes in one variable relate to changes in another, but not to establish mechanism.
Experimental research: establishes cause-and-effect by manipulating one or more independent variables and holding other variables constant.
Key goal: determine how changing a variable causes changes in another variable.
Methods for data collection: surveys, performance tests, observations, physiological measures (heart rate, blood pressure, hormones), or group-level observations.
Data types in research:
Quantitative data: numerical measurements (e.g., height in cm, weight, test scores, happiness on a 1–10 scale). Variables are defined in advance and measured consistently across participants.
Qualitative data: descriptive, narrative data that capture the quality or experience (e.g., interview transcripts, descriptions of feelings, themes emerging from responses).
Methodological complementarity:
Qualitative data can illuminate phenomena that are hard to measure quantitatively and can guide later quantitative research.
Quantitative data provide precise, comparable measurements and statistical analyses.
Often, researchers start with descriptive or qualitative work to understand a phenomenon and then move to quantitative methods to test hypotheses more rigorously.
Data types: quantitative vs qualitative
Quantitative methods:
Involve measurements that yield numbers (e.g., heights, scores, counts).
Require clearly defined variables and predefined measurement procedures for consistency across participants.
Produce numerical results and enable statistical analysis.
Qualitative methods:
Focus on the quality or nature of experiences, not reducible to simple numbers.
Produce narratives, descriptions, and themes (e.g., interviews about concert experiences).
Variables may emerge from the data rather than being predefined; analysis focuses on identifying patterns and themes.
Relationship between the two:
They are complementary; many studies use a mixed-methods approach, starting with qualitative data to identify themes and then collecting quantitative data to test those themes.
Experimental design: controlled manipulation and measurement
Illustration of an experiment (candy and happiness):
A small demonstration with volunteers to illustrate manipulation and measurement.
Independent variable: candy vs. no candy (the presence or absence of candy).
Dependent variable: happiness level (self-reported on a scale of 1 to 10).
Design issues in the demonstration:
Sample: 10 volunteers (n = 10).
Group assignment: 5 in the candy group and 5 in the no-candy group.
Random assignment: Ideally, participants would be assigned to groups by randomization (e.g., flipping a coin); in the demonstration it was not strictly random, described as a convenience/ad hoc setup.
Baseline measurement: Happiness was measured before manipulation to establish a baseline for each participant.
Baseline results: Participants showed varying baseline happiness (e.g., some reported around 6, others higher or lower), illustrating why a baseline helps interpret changes due to manipulation.
Post-manipulation measurement: Happiness was assessed after the manipulation to determine the effect of candy.
Potential confounds and limitations:
Volunteer bias: volunteers may be more motivated or have different preferences (e.g., some dislike candy).
Stage presence and the experimental setting may influence responses (demand characteristics, social desirability, excitement on stage).
Not all participants may respond the same way to candy due to individual differences.
The lack of strict randomization and other controls means the design is not a rigorous controlled experiment.
Core teaching point: A well-designed experiment carefully controls conditions, randomly assigns participants to groups, and manipulates one or more independent variables to observe effects on a dependent variable.
Key question in this example: What did we manipulate between conditions, and what variable did we observe changing?
Manipulated variable: whether a group received candy or not.
Observed variable: happiness level.
Practical guidance for evaluating scientific versus pseudoscientific claims
In scientific claims, researchers aim for falsifiability and replication; data are published with openness to review and criticism.
Pseudoscience often relies on anecdotes, selective data, and resistance to falsification or replication.
A strong scientific claim typically involves:
A testable hypothesis with defined measures.
A design that allows for potential falsification.
Methods and data that enable replication and verification by others.
Be mindful of how data are presented in media and online: look for whether claims reference controlled experiments, peer review, and whether alternative explanations have been considered.
Connections to foundational principles and real-world relevance
The process of scientific inquiry is cumulative and collaborative, building on prior work rather than relying on a single breakthrough.
Representativeness is crucial for external validity: conclusions drawn from a sample are most trustworthy when the sample closely matches the population of interest.
Inductive reasoning and uncertainty are inherent in scientific claims; researchers quantify the likelihood that results reflect true effects rather than chance.
Distinguishing descriptive, correlational, and experimental approaches helps researchers answer different kinds of questions:
Descriptive: What is happening?
Correlational: How are variables related?
Experimental: What causes what?
Ethical and practical implications include transparency, openness to replication, careful interpretation of generalizability, and avoidance of overclaiming from limited data.