Notes on the Process of Science (Lecture 2)

Process of Science: Overview

The process of science (also called the scientific method) is a nonlinear system. The simplified version shown is a loopable model with emphasis on core ideas rather than a strict linear sequence.
Core idea: scientists ask questions about the natural world, form hypotheses, test predictions with experiments, analyze data, and share findings so others can reproduce or challenge them. The cycle can loop back to revise observations, hypotheses, or methods.
The simplified eight-step model (nonlinear in practice):
- 1) Make an observation of the natural world (not supernatural).
- 2) Form a question from the observation.
- 3) Do background research to learn what is already known about the question.
- 4) Form a hypothesis (a testable statement).
- 5) Make a prediction based on the hypothesis (what you would expect to see if the hypothesis is correct).
- 6) Test the predictions by performing experiments and collecting data.
- 7) Analyze the data to determine whether the evidence supports or falsifies the hypothesis.
- 8) Share findings with others (start with collaborators, expand to broader communities, and ultimately publish in peer-reviewed venues).
In practice, scientists often revisit and revise steps. A hypothesis may be rewritten or rejected, leading to new questions and experiments.
The distinction between hypothesis and theory:
- A hypothesis is a testable, specific, and falsifiable statement or educated guess about a particular scenario or mechanism.
- A theory is a broad, well-supported framework that integrates many tests and observations, has strong predictive value, and explains a wide range of phenomena. Theories are highly wweighty in science and are considered robust explanations, though not “proven” in an absolute sense; truth can evolve with new evidence.
Why scientists use inductive and deductive reasoning:
- Inductive reasoning (descriptive or discovery science) builds hypotheses from patterns and observations.
- Deductive reasoning (experimental science) applies general principles to specific situations to generate testable predictions.
Everyday illustration (flashlight scenario): during monsoon or power outage, observations lead to hypotheses about why the light won’t turn on. Hypotheses are tested via experiments, predictions are checked, and conclusions are drawn.
The nature of predictions and testability: a hypothesis must be testable and falsifiable; it should be possible to show it is wrong through data.
Science deals with the natural world; supernatural explanations are not testable by scientific methods.

Key Concepts: Hypothesis, Prediction, and Theory

Hypothesis (testable statement): a tentative explanation or educated guess about what might be happening.
- Must be testable and falsifiable.
- Example: "Batteries are dead" or "Bulb is burned out" in a flashlight scenario.
Prediction: a specific, testable consequence that follows from a hypothesis.
- Example: "If I replace the batteries, the light will turn on."
Theory: a broad explanation built on many tested hypotheses and observations with strong predictive power.
- Theories accumulate evidence from many tests and observations over time.
- They are not declared proven in an absolute sense; new data can refine or revise them.
Relationship among concepts: hypotheses lead to predictions that are tested; if data support the predictions, the hypothesis is supported; if not, the hypothesis is revised or rejected; theories arise from multiple well-supported hypotheses.

Inductive vs Deductive Reasoning

Inductive reasoning (descriptive/discovery science): infer general principles from specific observations.
Deductive reasoning (experimental science): apply a general principle to a specific situation to make predictions.
Practical example of deductive reasoning: if the general principle is "If it sounds too good to be true, it isn’t true," you apply it to a suspicious offer and anticipate rejecting it.
Hypotheses can be formed by both approaches, but lab experiments commonly emphasize deductive, testable predictions.

Hypotheses in Practice: Testable and Falsifiable Statements

A hypothesis must be testable in the natural world. Statements invoking supernatural causes are not testable scientifically.
Example scenarios for testing hypotheses about why a flashlight won’t work:
- Hypothesis 1: Batteries are dead. Prediction: If I replace the batteries, the light will turn on.
- Hypothesis 2: The bulb is burned out. Prediction: If I replace the bulb, the light will turn on.
- If a test falsifies a hypothesis (e.g., replacing batteries doesn’t fix the light), that hypothesis is falsified, and you move to the next hypothesis.
The distinction between “prove” and “support”: scientists avoid saying a hypothesis is proven true; they say it is supported by the data. If new evidence contradicts it, the hypothesis can be revised or rejected.
Example of a falsifiable, specific hypothesis: "Temperature affects the rate at which sugar dissolves in water" (a testable, specific scenario with measurable outcome).
Classifying a statement as a hypothesis, a question, or a theory:
- A specific, testable statement about a ready-to-test scenario is a hypothesis.
- An open, curiosity-driven question (e.g., "Why is the sky blue?") is a question, not a hypothesis.
- A broad, well-supported, predictive framework about many phenomena is a theory.

Experimental Design: Variables and Controls

In lab experiments, researchers control the environment and compare two or more groups:
- Experimental groups: receive the variable being tested (the independent variable).
- Control group: does not receive the experimental variable (or receives a baseline level).
- All other factors should be kept constant to avoid confounding variables.
Variables defined:
- Independent variable (experimental variable): the factor deliberately changed between groups. Also called the predictor or input; example: amount of water given to plants.
- Dependent variable (response variable): the outcome measured to assess the effect of the independent variable. Also called the outcome; example: plant growth (biomass).
- Controlled variables (constants): factors kept the same across all groups (plant type, soil type, pot size, temperature, light, etc.).
Concept of confounding variables: uncontrolled differences that could influence the dependent variable, making it hard to attribute effects to the independent variable.
Experimental designs often use a graph with:
- Independent variable on the x-axis (e.g., amount of water: low, medium, high).
- Dependent variable on the y-axis (e.g., plant biomass).
- Data points representing measurements, with a mean (average) denoted as a representative point.
Example: testing the effect of water amount on plant growth
- Independent variable (x): amount of water per day.
- Dependent variable (y): plant growth (biomass).
- Control variables: plant species, soil type, pot size, sunlight, temperature.
Independent vs dependent variables in the sugar dissolving experiment:
- Independent variable: temperature (°C).
- Dependent variable: rate of sugar dissolving (seconds to dissolve).
- Controlled variables: type of sugar, stirring, container size, surface area, etc.
Replication and sample size:
- Replication: repeating the entire experiment to verify consistency of results.
- Sample size: the number of individual trials per condition (e.g., the number of measurements at each temperature).
- Larger sample size (e.g., 5000) yields greater confidence in the observed pattern than a small sample size (e.g., 5).
- In the sugar example, data points are individual trials; the filled circle represents the average time to dissolve at that temperature.
Data interpretation:
- Does the data support the hypothesis? If yes, replicate to ensure consistency; if no, revise the hypothesis and test again.

Data, Measurement, and Graphs

Data collection uses precise, unambiguous observations and measurements, typically in the metric system (SI units).
Measurements should be recorded with sufficient precision to permit replication by others.
Example visualization (temperature vs. dissolution rate):
- x-axis: temperature (independent variable) →
- y-axis: rate of dissolution (dependent variable) →
- Each temperature yields multiple data points; the average is shown as a filled circle.
Measurement units used in examples: seconds for dissolution time; kilograms per hectare for fertilizer application; degrees Celsius for temperature; biomass for yield.
Reproducibility is facilitated by:
- Clearly defined procedures.
- Precise measurement techniques.
- Detailed reporting of sample size and replication.

Case Study 1: Nitrogen Fixation in Legumes and Wheat Yield (Field/Natural Experiment)

Background knowledge:
- Legumes (peas, beans) have nitrogen-fixing nodules that convert atmospheric nitrogen into forms plants can use. This enriches soil nitrogen, aiding growth of subsequent crops.
- Nitrogen is needed to build proteins and other biomolecules; nitrogen availability influences plant health.
- Artificial fertilizers are often derived from petroleum; their use has environmental and economic costs (soil damage, drinking-water contamination, aquatic life impact, fossil-fuel use, and potential contributions to global warming).
Research question: Can beans (lima/pigeon peas) provide enough organic nitrogen to grow winter wheat without artificial fertilizer?
Hypothesis and prediction:
- Hypothesis: A crop of summer peas will provide enough organic nitrogen to grow winter wheat without fertilizer.
- Predictions: If peas are grown, planted, and incorporated into soil, winter wheat planted afterward will yield similar or better biomass without added artificial fertilizer compared to fertilizer-treated soils.
Experimental design (summary): four groups in clay pots/fields with similar conditions (outside, same rain, sunlight, temperature):
- Control: winter wheat grown in soil with no peas or artificial fertilizer.
- Group 1: winter wheat with artificial fertilizer at 45 kg/ha.
- Group 2: winter wheat with artificial fertilizer at 90 kg/ha (double of Group 1).
- Group 3: winter wheat grown in soil into which ground-up summer peas had been incorporated (no artificial fertilizer).
Implementation details:
- Ground peas were incorporated into soil before planting wheat.
- Same soil type, pot size, plant density, sunlight, weather exposure, and timing across all pots.
- Measurements focused on wheat yield via biomass (dry mass) after a fixed growth period.
Measurements and variables:
- Independent variable: fertilizer type/level and pea incorporation. Potentially viewed as multiple treatments (fertilizer amount and pea incorporation).
- Dependent variable: wheat yield (biomass) after drying; yield is used instead of height to reflect total growth.
- Controlled variables: pot size, soil type, sunlight, temperature, watering, growth period.
Results (year 1):
- The control (no fertilizer) produced a certain baseline yield (lowest signal? needs exact heights in the slide).
- 45 kg/ha fertilizer yielded higher than control.
- 90 kg/ha fertilizer yielded the highest yield among the fertilizer treatments.
- Pea incorporation group yielded the lowest yield, even lower than the no-fertilizer control.
Interpretation and next steps:
- The pea-based treatment did not support the initial hypothesis; the artificial fertilizer treatments outperformed peas in year 1.
- Hypothesis revised: a sustained crop rotation of summer peas followed by winter wheat could eventually provide enough organic nitrogen to grow wheat without fertilizer.
- The researchers conducted a multi-year rotation study, maintaining soil conditions but cycling through the pea-wheat rotation, with incremental fertilizer additions continuing in fertilizer groups, to test whether long-term rotation might accumulate nitrogen sufficient to eliminate fertilizer needs.
Notes on concept: replication and sample size were essential to validate results over multiple years; this helps distinguish real effects from year-to-year variability.

Case Study 2: Observational Science vs Experimental Science

Observational science:
- The researcher does not manipulate the environment; instead, they observe existing conditions and look for patterns or correlations.
- Examples of observational questions: saguaros grow better on the south side of a slope than the north; warmer ocean temperatures correlate with decreased species diversity near the Great Barrier Reef; the Human Genome Project maps gene locations.
- Key point: correlations can be found, but they do not establish causation.
Why not study everything in a lab?
- Some cases are impractical or impossible to reproduce in a controlled laboratory setting (e.g., long-lived organisms like saguaros; large-scale environmental phenomena like ocean temperature effects on reef biodiversity).
- Field observations or observational studies provide the data when lab experimentation is not feasible.
Correlation vs causation:
- Observational studies can reveal correlations (relationships between two variables), but they cannot prove causation because of potential confounding variables and alternative explanations.
- Example given: ice cream sales and crime rates both rise in the summer; this is a correlation, not evidence that ice cream causes crime.
Statistical reasoning in both correlational and experimental contexts:
- Statistics help determine whether observed differences are likely due to chance or reflect a real effect.
- In experimental designs, statistics can support causal inferences about the effect of the independent variable.
- In correlational studies, statistics quantify the strength and direction of associations but cannot prove causation.
Significance and hypothesis testing:
- A key standard is the 95% confidence criterion: there should be less than a 5% chance that the observed result would occur if the null hypothesis were true.
- Expressed as p < 0.05 for statistical significance.
- If results are significant, researchers may reject the null hypothesis; if not, they fail to reject it or fail to find evidence for a difference.

Case Study: Everyday and Field Implications

Practical and ethical implications discussed in the lecture:
- Artificial fertilizers are derived from petroleum (crude oil) with broader environmental impacts.
- Potential soil damage, contamination of drinking water, negative effects on aquatic life, and increased fossil fuel use contributing to global warming.
- Agricultural practices such as crop rotation (e.g., peas then wheat) may offer sustainable approaches to nitrogen management without relying on synthetic fertilizers.
Real-world relevance:
- The lecture connects the process of science to sustainable agriculture, soil health, and environmental stewardship.
- It emphasizes the iterative nature of experimentation and the need to consider ecological and economic factors in applying scientific findings.

Quick Reference: Terms and Concepts

Observation: careful, unbiased description of natural phenomena using the senses and tools; requires unambiguous, metric measurements.
Question: a query arising from an observation.
Background research: review of existing knowledge relevant to the question.
Hypothesis: a testable and falsifiable statement about the natural world.
Prediction: a specific expectation derived from a hypothesis.
Independent variable (experimental variable): the factor deliberately changed between groups; on the x-axis in graphs.
Dependent variable (response variable): the outcome measured; on the y-axis in graphs.
Controlled variables: factors kept constant across all groups to avoid confounding influences.
Replication: repeating the entire experiment to verify results.
Sample size: number of experimental units per condition; larger sample sizes improve confidence.
Observational science: studies that do not manipulate variables; rely on correlations.
Experimental science: studies that manipulate an independent variable and observe its effect on a dependent variable; uses control groups.
Correlation vs causation: correlation indicates association; causation implies a direct cause-and-effect relationship, which requires controlled experimentation to establish.
Significance: a statistical measure of whether results are unlikely due to chance; commonly, p < 0.05 denotes statistical significance.
Notion of proof in science: scientists typically say hypotheses are supported or refuted, not proven.

Equations, Units, and Graphical Conventions (LaTeX)

Temperature examples:
- Room temperature: $T_{room} \,\approx\, 23^{\circ}\mathrm{C}$
- Warm/test temperature: $T_{warm} = 37^{\circ}\mathrm{C}$
- Boiling temperature: $T_{boil} = 100^{\circ}\mathrm{C}$
Fertilizer application (experimental groups):
- $45\ \mathrm{kg\,ha^{-1}}$
- $90\ \mathrm{kg\,ha^{-1}}$
Variables and axes:
- Independent variable (x-axis): $x = \text{amount of fertilizer or type of nitrogen source}$
- Dependent variable (y-axis): $y = \text{wheat yield (biomass, e.g., dry mass in g or kg)}$
Significance: p < 0.05 indicates results unlikely due to chance at the 5% level.
Data representation:
- Individual measurements: data points (circles).
- Mean/average: filled data point (e.g., solid circle) representing the average across trials.
Yield measurement notation: $\text{Yield} = \text{mass of dry wheat (kg or g)}$ after harvest and drying.

Takeaways and Study Notes

The process of science is iterative and nonlinear; scientists continuously refine observations, hypotheses, and experiments based on data.
A hypothesis must be testable and falsifiable; data can support but not prove it definitively.
The theory is a broad, well-supported framework that integrates many lines of evidence and makes powerful predictions.
Distinguish between experimental (causal) studies and observational (correlational) studies; correlations do not imply causation.
Use rigorous experimental design with clear definitions of independent, dependent, and controlled variables; replication and adequate sample size are critical for reliable conclusions.
Real-world problems (e.g., fertilizer use and soil health) illustrate how science informs sustainable practices, ethical considerations, and policy implications.

Quick Recap: Key Definitions to Memorize

Hypothesis: a testable statement about what will happen under specified conditions.
Theory: a broad, well-supported explanation that integrates many observations and experiments.
Independent variable: the factor deliberately changed between groups (x-axis).
Dependent variable: the outcome measured (y-axis).
Controlled variables: factors kept constant across all conditions.
Replication: repeating the entire experiment to ensure results are consistent.
Sample size: number of observations per condition.
Correlation: association between two variables observed in data.
Causation: a cause-and-effect relationship established through controlled experimentation.
Significance: statistical likelihood that observed effects are not due to chance; commonly p<0.05 as a threshold.