IS

Statistics – Comprehensive Study Notes

Definitions

  • Statistics (general science)
    • Science of obtaining, synthesizing, predicting, and drawing inferences from data.
    • Effective use of numerical data that describe groups of individuals or experimental results.
    • Encompasses collection, analysis, interpretation, and communication of data-driven findings.
  • Measures of Central Tendency
    • Mean (Average): \bar{x}= \frac{\sum{i=1}^{n} xi}{n}
    • Median: Midpoint when values are ordered from smallest to largest.
    • Mode: Most frequently occurring value.
  • Standard Deviation (SD): Indicates dispersion—how tightly data cluster around the mean.
    • Formula: SD = \sqrt{Variance}
    • Steeper bell curve → smaller SD; flatter curve → larger SD.

Why Learn & Use Statistics?

  • Establishes reasonable certainty that research findings are true rather than random.
  • Enables critical appraisal of other researchers’ data, methods, and conclusions.
  • Essential for evidence-based decision-making in health care, public policy, business, etc.

Distribution Shapes & the Empirical Rule

  • Normal distribution: Symmetrical, bell-shaped; majority of observations cluster near the mean.
  • Multiples of SD capture predictable proportions of the data:
    • \pm 1\;SD \Rightarrow \approx 68\%
    • \pm 2\;SD \Rightarrow \approx 95\%
    • \pm 3\;SD \Rightarrow \approx 99\%

Interpreting SD: School Test-Score Example

  • School A mean > School B mean does not guarantee better performance.
  • Large SD at School A ⇒ more extreme highs & lows; small SD at School B ⇒ most students cluster near mean.
  • Always examine both the center (mean/median) and spread (SD).

Confidence Intervals (CI) & Margin of Error

  • Researchers conventionally set a 95\% CI (≈ \pm 2\,SD).
    • Example: Mean =75, SD = 3
    • CI = 75 \pm (2 \times 3) = [69, 81]
    • Interpretation: We are \approx95\% confident the true population mean falls in [69,81].

Descriptive vs Inferential Statistics

  • Descriptive: Summarize sample data exactly as observed (e.g.
    frequencies, means, batting average =.333).
  • Inferential: Generalize from sample to population, test hypotheses, estimate parameters, model causality.

Research Design & Causality

  • Core question: “Does doing A cause B?”
    • Example: Does Drug X lower blood pressure?

11-Step Experimental-Design Blueprint

  1. Select a problem (who, what, when, why, how).
    • “State” problem → single snapshot; “Process” problem → change over time.
  2. Identify dependent variables (outcomes measured).
  3. Identify independent variables (manipulated factors).
  4. Set levels of each independent variable (number of experimental conditions).
  5. Evaluate combinations among independent variables.
  6. Determine observations (sample size, number of measurements).
  7. Redesign after steps 1–6 if necessary.
  8. Randomize assignments to eliminate systematic bias.
  9. Meet ethical & legal standards
    • Institutional Review Board (IRB) approval.
    • Informed consent.
    • Historical cautionary tales:
      • Tuskegee Syphilis Study.
      • Milgram Obedience Experiment.
  10. Develop mathematical/statistical model.
  11. Collect data, then proceed to:
      1. Data reduction (coding, summarization).
      1. Data verification (quality checks).

Study Types

  • Experimental Study: Measure → manipulate → re-measure.
    • Hawthorne Illumination Study: Productivity rose because workers knew they were observed (Hawthorne Effect), not because light levels changed.
  • Observational Study: Observe variables without manipulation; identify correlations.
    • Smoking vs lung-cancer incidence comparison.

Statistical Significance & Hypothesis Testing

  • “Statistically significant” ⇒ Result unlikely due to chance alone.
  • Hypotheses
    • H_0 (null): No effect / defendant innocent.
    • H_1 (alternative): Effect present / defendant guilty.
  • Errors
    • Type I (False Positive): Reject H_0 when true (convict an innocent).
    • Type II (False Negative): Fail to reject H_0 when false (acquit a guilty).

Statistical Literacy: Critical Reading Checklist

  1. Source: Who collected data? Credentials? Funding?
  2. Peer review: Has the work been critiqued by experts?
  3. Methodology: How were data collected/processed?
  4. Skepticism of comparisons: Correlation ≠ causation (e.g., stork migration & birth rate coincidence).
  5. Context: Beware of isolated numbers or percentages.

Clinical Trials & Pilot Studies

  • Pilot experiment: Small-scale preliminary study that guides full trial design.
  • Effectiveness: Real-world performance; Efficacy: Performance under controlled trial conditions.
  • Typical objectives:
    1. Test safety/efficacy of new drug/device.
    2. Test different dose levels.
    3. Test existing product for new indication.
    4. Compare new product vs standard of care.
    5. Compare two or more approved interventions.
  • Best practice: Double-blinded design (subjects & investigators unaware of assignments).
  • Informed consent must state purpose, durations, procedures, risks, benefits, contact info, right to withdraw anytime.

Sampling Concepts

  • Theoretical population: Group to which you want to generalize.
  • Accessible (study) population: Group actually reachable.
  • Sampling frame: Complete list of accessible population members.
  • Sample: Selected subset invited to participate.
  • Subsample: Participants who actually take part.

Data, Variables & Attributes

  • Quantitative data: Numeric (e.g., age, test score).
  • Qualitative data: Non-numeric (e.g., gender, color).
  • Variable: Any characteristic with varying values (sex, age, agreement).
  • Attribute: Specific value of a variable (male/female).
  • Independent variable: Manipulated cause.
  • Dependent variable: Observed effect.

Time Dimensions in Research

  • Cross-sectional study: Single time-point snapshot.
  • Longitudinal study: Repeated measures over time (≥2 waves).

Descriptive vs Inferential Recap

  • Descriptive statistics summarize sample (e.g., baseball batting average =.333 indicates 1 hit every 3 at-bats).
  • Inferential statistics ask whether such performance can generalize to broader groups (e.g., hitters with similar height/weight).

Fallacies & Reasoning Errors

  • Fallacy: Logical error stemming from false assumptions or faulty methodology.
  • Always scrutinize premises, data integrity, and inferential leaps to avoid drawing erroneous conclusions.