1/8
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Topic 1) Design of Experiments
What Is Statistics
Statistics = the science of data — collecting, analysing, and interpreting data to solve problems.
Data scientists use statistics to find insights and communicate results.
Ethics and Privacy in Data (MIP W)
Data provenance:
→ Know where the data came from and document its source.
Data management:
→ Have a transparent plan for:
Collection
Storage
Sharing and access
Reporting
Keeping data non-identifiable (especially personal data).
Data wrangling:
→ Set clear rules for fixing errors or missing values, since these choices can affect results later.
Indigenous data:
→ Follow CARE principles:
Collective benefit, Authority to control, Responsibility, Ethics.
💾 Big Data
“Big Data” = extremely large, complex, fast-moving datasets.
Defined by the 4 Vs:
Volume – huge amount of data
Variety – many types (text, images, etc.)
Velocity – collected quickly and continuously
Veracity – accuracy and reliability
Also high volatility (changes quickly) and value (useful insights).
🧠 Domain Knowledge
Understanding the context behind the data.
Data scientists need:
Curiosity about the topic.
Collaboration skills with field experts.
Sometimes they become domain experts themselves.
RANDOMISED CONTROLLED TRIALS (RCTs)
Definition:
An experiment where participants are randomly assigned to groups to fairly test a treatment or intervention.
🧾 Types of Evidence
Type | Reliability | Notes |
Personal testimony | ❌ Weak | Based on observation, not verified; media often misuses these. |
Reputable research | ✅ Strong | Published in peer-reviewed journals; must be reproducible and transparent. |
🚫 Avoiding Bias
Bias = anything that makes results less accurate or fair.
Type of Bias | Description | Solution |
Selection bias | Some people are more likely to be chosen than others. | Random allocation. |
Observer bias | Researchers or participants know who’s in which group. | Double-blind setup. |
Placebo effect | Participant expects a change just because they believe they’re being treated. | Use a placebo (fake treatment). |
Confounding variable | A third factor influences results (e.g. age, diet). | Randomisation or control groups. |
Consent bias | Only certain people agree to take part. | Careful design; ethics approval. |
Survivor bias | Results only include those who finished the study. | Track dropouts carefully. |
Adherer bias | People who stick to the treatment differ from those who don’t. | Control for adherence. |
⭐ Gold Standard Experiment
Double-blind Randomised Controlled Trial (RCT):
Participants and investigators don’t know who receives the real treatment.
Helps remove bias from both sides.
Use a placebo that looks like the treatment.
Often difficult to do (ethical, financial, or practical limits).
⚠ Limitations & Precautions
No Causation (Only Association)
Observational studies show links, not cause-and-effect.
Use words like “associated with,” “linked to,” “increases” instead of “causes.”
Only RCTs can prove causation.
Can Look Like an RCT
Sometimes studies seem like RCTs but aren’t truly randomised.
Confounding variables (like time or environment) can mislead results.
Example:
New drug group (current patients) vs. old drug group (past patients).
→ Time is a confounding variable.
To fix this: use contemporaneous controls (studied at the same time).
Controlling for Confounders
Divide participants into subgroups (e.g. heavy, medium, light drinkers).
Compare within each subgroup (smokers vs non-smokers).
Helps isolate the true effect of the variable being studied.
🔄 Simpson’s Paradox
A trend seen in individual groups reverses when the groups are combined.
Happens when a confounding variable distorts results.
Example:
Each department at a university admits more women, but overall the university admits fewer women — because more women applied to harder-to-enter departments.
Key idea:
Confounding variables can flip the overall conclusion if not properly controlled.