Experimental Design

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/50

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

51 Terms

New cards

What is Experimental Design in data science?

The process of testing hypotheses and gathering meaningful, unbiased data for data-driven decisions.

New cards

What is the main goal of experimental design?

To collect reliable data efficiently while minimizing time, cost, bias, and mistakes,

New cards

In data science, what does experimental design help answer?

Which option or decision optimizes an objective or goal function (e.g., maximizing CTR)

New cards

Why is defining the research question important?

To make sure the experiment is properly planned and addresses what we need it to address

New cards

What should most data science problems ultimately aim to do?

Predict future outcomes or identify the most optimal solution

New cards

What are the two main types of variable in an experiment?

Independent and Dependent

New cards

Example of independent and dependent vars?

IV: Website’s button color DV: click through rate

New cards

What define a control group vs. a treatment group?

Control group gets no change while the treatment group gets the change

New cards

Why identify the population or sample?

It clarifies who we’re representing in the study

New cards

What is a hypothesis?

An educated guess about the relationship between variables that we can test

New cards

How do you test a hypothesis?

conduct experiments to see if your hypothesis was correct!

New cards

Sequence for stating a hypothesis?

If this happens, then this will happen

New cards

Which variable is manipulated and which is measured?

The IV is manipulated and the DV is measured

New cards

What is a confounder?

A 3rd party variable that influences the results of our experiment, thereby distorting the actual IV to DV relationship

New cards

Examples of confounders?

Prior knowledge in area, socioeconomic status, user demographics

New cards

Why are confounders typically considered bad for an experiment?

They cause biased or invalid results, leading us to incorrect conclusions

New cards

List the DESIGN STAGE strategies to handle confounders!

Randomization, Restriction, Matching, and Replication

New cards

List the ANALYSIS-STAGE strategies!

Replication and multivariable regression

New cards

What is multivariable regression?

A measure of how one outcome is affected by several factors at once

New cards

Restate this question using a multivariable regressive thought process!: “How does exercise affect weight?”

How do exercise, age, and diet together affect weight?

New cards

What happens during RANDOMIZATION?

We assign participants to groups randomly to balance confounding variables and minimize bias.

New cards

What is restriction?

Limiting the sample to one level of a confounder (e.g., only non-smokers) so it cannot vary

New cards

Disadvantage of restriction?

Reduces generalizability of results

New cards

Define MATCHING

pairing cases and controls with similar confounder values (e.g., same age & sex) to reduce bias

New cards

Limitations of matching?

Difficult when many confounders exist

New cards

Why replicate an experiment?

To confirm results and strengthen confidence in the results

New cards

In “study time → exam score” example, what confounder exists?

Prior knowledge

New cards

In “study time → exam score” example, how can prior knowledge be controlled using RANDOM SAMPLING (RCT)?

Randomly people to control/treatment groups

New cards

In “study time → exam score” example, how can prior knowledge be controlled using STRATIFIED RANDOMIZATION?

group people by their level of prior knowledge before then randomly assigning

New cards

In “study time → exam score” example, how can prior knowledge be controlled using BLOCK DESIGN (matched pairs)?

Pair people by their prev knowledge then randomly assign

New cards

What are the four key data-collection methods?

Observational studies, Surveys, Experiments, Simulations

New cards

Define an observational study.

A study in which you simply observe to see the results

New cards

List the 3 types of observational studies

Cross-section, Retrospective, Prospective

New cards

One line explanation of a CROSS-SECTIONAL study?

Snapshot at a specific event in time

New cards

One line explanation of a RETROSPECTIVE study?

Looking back at past data

New cards

Example of a RETROSPECTIVE study?

Case Control

New cards

What happens during case-control?

comparing people with a specific outcome (cases) to those without it (controls) to find factors linked to that outcome.

New cards

Example of a case-control? Say, you want to find out if smoking is linked to lung cancer. What would be the Case and what would be the Control (you’d be looking at their past to figure out how many have smoked before)?

Case: people who have lung cancer Control: People who don’t have lung cancer

New cards

One line explanation of a PROSPECTIVE study?

Following a group of people over time

New cards

Key design concern of surveys?

Bias due to wording of the questions

New cards

What differentiates EXPERIMENTS from OBSERVATIONAL  studies?

In experiments, we change the IV

New cards

What is the placebo effect?

symptoms improve because they thing they’re receiving treatment (even though it’s fake)

New cards

What is blinding used for?

To minimize bias by having subjects not know who is getting the real treatment

New cards

What is a single-blind?

One side is unaware (usually the participants)

New cards

What is a double-blind?

Both sides are unaware (participants and researchers)

New cards

What is the fundamental rule of data collection?

the data must actually represent the population we’re testing

New cards

Can we eliminate all bias from experiments?

No!

New cards

Best data‑collection method for studying effects of a severe earthquake?

Simulation

New cards

Best data‑collection method for testing a coupon’s influence on catalog purchase rates?

A/B test experiment

New cards

Best data‑collection method for studying if smoking affects heart disease?

Observational study (case-control)

New cards

Best data‑collection method for finding average household income in a city?

Survey