Module 1) Exploring Data hehe

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/8

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

9 Terms

New cards

Topic 1) Design of Experiments

New cards

What Is Statistics

Statistics = the science of data — collecting, analysing, and interpreting data to solve problems.
Data scientists use statistics to find insights and communicate results.

New cards

Ethics and Privacy in Data (MIP W)

Data provenance:
→ Know where the data came from and document its source.
Data management:
→ Have a transparent plan for:
- Collection
- Storage
- Sharing and access
- Reporting
- Keeping data non-identifiable (especially personal data).
Data wrangling:
→ Set clear rules for fixing errors or missing values, since these choices can affect results later.
Indigenous data:
→ Follow CARE principles:
Collective benefit, Authority to control, Responsibility, Ethics.

New cards

💾 Big Data

“Big Data” = extremely large, complex, fast-moving datasets.
Defined by the 4 Vs:
- Volume – huge amount of data
- Variety – many types (text, images, etc.)
- Velocity – collected quickly and continuously
- Veracity – accuracy and reliability

Also high volatility (changes quickly) and value (useful insights).

New cards

🧠 Domain Knowledge

Understanding the context behind the data.
Data scientists need:
- Curiosity about the topic.
- Collaboration skills with field experts.

Sometimes they become domain experts themselves.

New cards

RANDOMISED CONTROLLED TRIALS (RCTs)

Definition:
An experiment where participants are randomly assigned to groups to fairly test a treatment or intervention.

🧾 Types of Evidence

Type	Reliability	Notes
Personal testimony	❌ Weak	Based on observation, not verified; media often misuses these.
Reputable research	✅ Strong	Published in peer-reviewed journals; must be reproducible and transparent.

🚫 Avoiding Bias

Bias = anything that makes results less accurate or fair.

Type of Bias	Description	Solution
Selection bias	Some people are more likely to be chosen than others.	Random allocation.
Observer bias	Researchers or participants know who’s in which group.	Double-blind setup.
Placebo effect	Participant expects a change just because they believe they’re being treated.	Use a placebo (fake treatment).
Confounding variable	A third factor influences results (e.g. age, diet).	Randomisation or control groups.
Consent bias	Only certain people agree to take part.	Careful design; ethics approval.
Survivor bias	Results only include those who finished the study.	Track dropouts carefully.
Adherer bias	People who stick to the treatment differ from those who don’t.	Control for adherence.

New cards

⭐ Gold Standard Experiment

Double-blind Randomised Controlled Trial (RCT):
- Participants and investigators don’t know who receives the real treatment.
- Helps remove bias from both sides.
- Use a placebo that looks like the treatment.

Often difficult to do (ethical, financial, or practical limits).

New cards

⚠ Limitations & Precautions

No Causation (Only Association)
- Observational studies show links, not cause-and-effect.
- Use words like “associated with,” “linked to,” “increases” instead of “causes.”
- Only RCTs can prove causation.
Can Look Like an RCT
- Sometimes studies seem like RCTs but aren’t truly randomised.
- Confounding variables (like time or environment) can mislead results.
- Example:
  - New drug group (current patients) vs. old drug group (past patients).
  - → Time is a confounding variable.
- To fix this: use contemporaneous controls (studied at the same time).
Controlling for Confounders
- Divide participants into subgroups (e.g. heavy, medium, light drinkers).
- Compare within each subgroup (smokers vs non-smokers).
- Helps isolate the true effect of the variable being studied.

New cards

🔄 Simpson’s Paradox

A trend seen in individual groups reverses when the groups are combined.
Happens when a confounding variable distorts results.
Example:
Each department at a university admits more women, but overall the university admits fewer women — because more women applied to harder-to-enter departments.

Key idea:
Confounding variables can flip the overall conclusion if not properly controlled.