Process Science Notes (Chapter 1)

Page 1: What is science and the process of discovery

Science is described as a way to answer questions and a way of knowing. It’s about discovering new things through observation and experimentation.
The process is driven by curiosity: somebody out there is figuring something out, which leads to new technologies and discoveries (e.g., why new iPhones appear).
Science relies on evidence and a method for obtaining evidence.
Two broad types of evidence discussed:
- Anecdotal evidence: based on personal experience, individual observations, or stories. Often shared on social media and through word of mouth.
- Scientific evidence: backed by data, measurements, and systematic study; involves text data and rigorous analysis.
Question for reflection: which type of evidence is more reliable for informing real-world conclusions? The transcript argues for scientific evidence, while acknowledging that anecdotal evidence can inspire questions.

Page 2: From observation to questions: forming the scientific questions and evidence

Anecdotal evidence can inspire us to ask scientific questions because it provides observations that raise curiosity.
In science, questions lead to hypotheses that can be tested with evidence.
A key distinction: scientific evidence requires data and testing, not just personal experience.
Social media anecdotes illustrate how information can spread without verification; verification requires deeper analysis and expert review.

Page 3: Hypothesis and the testing requirement

After observations, scientists formulate a hypothesis (a testable question or educated guess).
A hypothesis must be testable; if it cannot be tested, it is not a valid scientific hypothesis.
Scientific evidence requires collecting hundreds of data points and accumulating data to support or refute a hypothesis, not just a single observation.
Repetition and large sample sizes are essential for reliable conclusions; statistics are used to analyze the data.

Page 4: Experimental design basics: variables and groups

Two main things in an experiment: a manipulated variable (the independent variable) and the measured outcome (the dependent variable).
Experimental design requires two groups:
- Experimental group: receives the treatment or manipulation.
- Control group: receives no treatment or a baseline condition.
The control group provides a standard for comparison to determine the effect of the manipulation.
A third concept often used is the placebo (a fake treatment that mimics the experimental condition) to control for the placebo effect.
The proper design requires parallel groups to ensure a fair comparison.

Page 5: Control variables and how to set up the experiment

Control variables are constants kept the same across both the control and experimental groups to ensure that any observed effect is due to the independent variable.
Example given: when using rats in an experiment, ensure the same age, the same number of males/females, similar activity levels, similar health, and the same treatment duration for both groups.
Purpose: to eliminate alternative explanations and isolate the effect of the independent variable.

Page 6: Independent vs dependent variables; placebo, and measurement

Independent variable: the factor deliberately changed by the experimenter. In the example, cell phone usage (presence vs. absence).
Dependent variable: the measured outcome. In the cell phone example, the incidence of cancer.
The independent variable must be able to stand alone and be manipulated; the dependent variable depends on the independent variable.
Placebo: a fake treatment given to the control group to mimic the experience of the experimental group and control for psychological effects.
Put simply: if you give a treatment to one group but not the other, differences in outcomes can be attributed to the treatment if other variables are controlled.

Page 7: A concrete example: cell phone exposure in rats

Experimental setup: rats exposed to cell phone use for ten minutes on, ten minutes off, for nine hours per day.
Outcome: incidence of specific cancers; results showed sex-specific effects:
- Male rats had a higher incidence of certain cancers in the exposed group compared to controls.
- Female rats did not show the same increase in cancer incidence.
Conclusion in the example: exposure affected males more than females for certain cancers; this leads to questions about biological reasons for sex differences and motivates further experiments.
The importance of replication: repeating the experiment by different scientists increases confidence in the conclusions.

Page 8: Replication, sample size, and statistical significance

Replication (doing the experiment again and again) increases confidence in conclusions.
A larger sample size improves statistical reliability and helps determine whether results are likely due to chance.
The phrase: “the more data points, the more reliable” captures the idea that large n yields more dependable estimates.
Statistical significance is used to decide whether observed effects are likely real and not due to random variation.

Page 9: From data to publication: peer review and journals

After conducting experiments, scientists publish results in journals.
Peer review involves experts in the specific field evaluating the study’s methods, data, and conclusions.
The goal is to validate quality and integrity; peers help determine if the work is credible or junk.
The process helps the scientific community build on ideas and encourages further research.

Page 10: Epidemiology and patterns: when controlled experiments aren’t possible

Some scientific questions cannot be answered with controlled experiments; researchers turn to epidemiology and patterns.
Epidemiologists study correlations and patterns in diseases across populations to identify possible risk factors.
Example: during the COVID-19 outbreak, immunocompromised individuals faced higher risk, showing a correlation between immune status and disease risk.
Correlation does not imply causation: patterns can point to associations but may involve confounding factors.
Population-level factors (e.g., dietary patterns, genetics, environment) influence disease risk and can complicate causal inferences.

Page 11: Population patterns, factors, and the complexity of disease

The transcript highlights a specific population (e.g., Black populations with dietary patterns high in fried foods and salt) to illustrate how patterns emerge in epidemiology.
Important caveat: a correlation in a population does not prove a direct causal link; multiple factors may contribute.
Complexity of disease means that even with correlations, establishing causation requires careful consideration and, where possible, controlled studies.

Page 12: Randomized controlled trials and study design in practice

When a known correlation exists in a population, researchers design controlled studies by defining criteria and randomly assigning participants to groups.
Randomization helps prevent selection bias and ensures comparable groups.
Example described: African Americans aged 18–25 recruited and randomly assigned to control or experimental groups in a clinical trial.
The duration and conditions for control groups are described (e.g., limits on cell phone use); long-term abstinence is often impractical, so time-bound controls are used.
The goal is to observe long enough to compare the incidence of disease between groups.

Page 13: Interpreting data for the public and media literacy

Scientific data can be technically dense and hard for non-scientists to interpret.
The media often presents headlines that are sensational or misleading (e.g., “Landmark study links cell phone radiation to cancer”).
Readers should be prepared to read beyond headlines and consult full texts or summaries to understand methodology and limitations.
Misleading headlines can create fear or misperceptions about risk.

Page 14: Epidemiology caveats: complexity and misinterpretation

Epistemic limits: patterns and correlations observed at the population level do not necessarily reveal causal mechanisms.
Important to consider confounders, sample size, and context when interpreting results.
The reliability of conclusions increases with reproducibility, rigorous methods, and independent verification.

Page 15: Theory, hypotheses, and the progression of scientific knowledge

Distinction between hypothesis and theory:
- A hypothesis is a testable statement.
- A theory is a hypothesis that has withstood extensive testing and rigorous validation over many years; it provides a well-substantiated explanation.
The transcript distinguishes between a well-supported theory (e.g., the theory of evolution) and everyday or less-established “theories.”
A scientific theory is not a guess; it is a robust framework that consistently explains and predicts phenomena.
Note: The example mentions general relativity as another scientific theory (a physical science context).

Page 16: From hypothesis to conclusion: publishing and continuing inquiry

Once results are obtained and vetted, they are published in peer-reviewed journals for the scientific community to examine and use.
The cycle continues: new observations lead to new questions, which lead to new hypotheses and experiments.
The transcript emphasizes an iterative, collaborative process where ideas are refined and expanded through replication and cross-disciplinary testing.

Page 17: Key terms glossary (selected terms from the transcript)

Observation: using senses or scientific instruments to notice and measure phenomena.
Hypothesis: a testable, falsifiable statement used to guide experiments.
Independent variable: the factor intentionally changed by the experimenter.
Dependent variable: the measured outcome influenced by the independent variable.
Control variable: factors kept constant to isolate the effect of the independent variable.
Experimental group: the group receiving the treatment or manipulation.
Placebo: a fake treatment used to control for placebo effects.
Replication: repeating an experiment to confirm results.
Sample size: the number of data points or subjects in a study; larger sizes generally increase reliability.
Statistical significance: a measure of whether observed effects are unlikely due to chance; often expressed via a p-value.
P-value: probability value used to assess statistical significance; common thresholds include p < 0.05 and p < 0.01.
Correlation vs. causation: correlation indicates a relationship, but does not prove that one variable causes another.
Epidemiology: study of disease patterns in populations and factors associated with health outcomes.
Theory (scientific): a well-tested and widely accepted explanation that has stood up to scrutiny over many years.
Literature review: surveying existing research to inform current work; emphasized as using credible sources (e.g., journals, not arbitrary internet sources).

Page 18: Quick wrap-up: the workflow of scientific inquiry (summary)

Start with careful observations (using scientific equipment when needed).
Ask a specific, testable question.
Review existing literature and gather relevant evidence.
Formulate a testable hypothesis.
Design an experiment with independent, dependent, and control variables; include a control group and, if appropriate, a placebo.
Plan for replication and collect a large sample size.
Analyze data with statistics; assess significance using p-values.
Submit findings for peer review and publication to validation by experts.
Consider broader patterns using epidemiology when controlled experiments aren’t possible.
Distinguish between correlation and causation; interpret results within a broader physical and biological context.
Develop theories only after extensive, repeated testing; continue to test and refine.

Page 19: Practical implications and real-world relevance

The scientific method informs technology development, public health policies, and our understanding of the natural world.
Ethical considerations include responsible data collection, avoiding misinformation, and transparency about methodologies and limitations.
Practical advice for students: verify sources, prefer peer-reviewed literature, and critically evaluate headlines vs. full articles.

Page 20: Notable equations and values cited in the transcript (LaTeX-formatted)

Statistical significance thresholds (examples):
- $p = 0.05$
- $p = 0.01$
- p < 0.01
General form of a test statistic (example for means comparison; illustrative):
- t = rac{ar{X}1 - ar{X}2}{sp \, oot \of{\frac{1}{n1} + \frac{1}{n_2}}}
- where $s_p$ is the pooled standard deviation.
Conceptual relation for sample size and reliability: $n \, \text{large} \Rightarrow \text{reliability} \uparrow$
Notation reminders:
- Independent variable: $X$ (manipulated, stands alone)
- Dependent variable: $Y$ (measured outcome)
- Control variable: constants kept the same across groups.

If you would like, I can collapse these into a more concise study guide or expand any section with more examples and practice questions for exam prep.