1/50
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is Experimental Design in data science?
The process of testing hypotheses and gathering meaningful, unbiased data for data-driven decisions.
What is the main goal of experimental design?
To collect reliable data efficiently while minimizing time, cost, bias, and mistakes,
In data science, what does experimental design help answer?
Which option or decision optimizes an objective or goal function (e.g., maximizing CTR)
Why is defining the research question important?
To make sure the experiment is properly planned and addresses what we need it to address
What should most data science problems ultimately aim to do?
Predict future outcomes or identify the most optimal solution
What are the two main types of variable in an experiment?
Independent and Dependent
Example of independent and dependent vars?
IV: Website’s button color DV: click through rate
What define a control group vs. a treatment group?
Control group gets no change while the treatment group gets the change
Why identify the population or sample?
It clarifies who we’re representing in the study
What is a hypothesis?
An educated guess about the relationship between variables that we can test
How do you test a hypothesis?
conduct experiments to see if your hypothesis was correct!
Sequence for stating a hypothesis?
If this happens, then this will happen
Which variable is manipulated and which is measured?
The IV is manipulated and the DV is measured
What is a confounder?
A 3rd party variable that influences the results of our experiment, thereby distorting the actual IV to DV relationship
Examples of confounders?
Prior knowledge in area, socioeconomic status, user demographics
Why are confounders typically considered bad for an experiment?
They cause biased or invalid results, leading us to incorrect conclusions
List the DESIGN STAGE strategies to handle confounders!
Randomization, Restriction, Matching, and Replication
List the ANALYSIS-STAGE strategies!
Replication and multivariable regression
What is multivariable regression?
A measure of how one outcome is affected by several factors at once
Restate this question using a multivariable regressive thought process!: “How does exercise affect weight?”
How do exercise, age, and diet together affect weight?
What happens during RANDOMIZATION?
We assign participants to groups randomly to balance confounding variables and minimize bias.
What is restriction?
Limiting the sample to one level of a confounder (e.g., only non-smokers) so it cannot vary
Disadvantage of restriction?
Reduces generalizability of results
Define MATCHING
pairing cases and controls with similar confounder values (e.g., same age & sex) to reduce bias
Limitations of matching?
Difficult when many confounders exist
Why replicate an experiment?
To confirm results and strengthen confidence in the results
In “study time → exam score” example, what confounder exists?
Prior knowledge
In “study time → exam score” example, how can prior knowledge be controlled using RANDOM SAMPLING (RCT)?
Randomly people to control/treatment groups
In “study time → exam score” example, how can prior knowledge be controlled using STRATIFIED RANDOMIZATION?
group people by their level of prior knowledge before then randomly assigning
In “study time → exam score” example, how can prior knowledge be controlled using BLOCK DESIGN (matched pairs)?
Pair people by their prev knowledge then randomly assign
What are the four key data-collection methods?
Observational studies, Surveys, Experiments, Simulations
Define an observational study.
A study in which you simply observe to see the results
List the 3 types of observational studies
Cross-section, Retrospective, Prospective
One line explanation of a CROSS-SECTIONAL study?
Snapshot at a specific event in time
One line explanation of a RETROSPECTIVE study?
Looking back at past data
Example of a RETROSPECTIVE study?
Case Control
What happens during case-control?
comparing people with a specific outcome (cases) to those without it (controls) to find factors linked to that outcome.
Example of a case-control? Say, you want to find out if smoking is linked to lung cancer. What would be the Case and what would be the Control (you’d be looking at their past to figure out how many have smoked before)?
Case: people who have lung cancer Control: People who don’t have lung cancer
One line explanation of a PROSPECTIVE study?
Following a group of people over time
Key design concern of surveys?
Bias due to wording of the questions
What differentiates EXPERIMENTS from OBSERVATIONAL studies?
In experiments, we change the IV
What is the placebo effect?
symptoms improve because they thing they’re receiving treatment (even though it’s fake)
What is blinding used for?
To minimize bias by having subjects not know who is getting the real treatment
What is a single-blind?
One side is unaware (usually the participants)
What is a double-blind?
Both sides are unaware (participants and researchers)
What is the fundamental rule of data collection?
the data must actually represent the population we’re testing
Can we eliminate all bias from experiments?
No!
Best data‑collection method for studying effects of a severe earthquake?
Simulation
Best data‑collection method for testing a coupon’s influence on catalog purchase rates?
A/B test experiment
Best data‑collection method for studying if smoking affects heart disease?
Observational study (case-control)
Best data‑collection method for finding average household income in a city?
Survey