Experimental Design

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/50

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

51 Terms

1
New cards

What is Experimental Design in data science?

The process of testing hypotheses and gathering meaningful, unbiased data for data-driven decisions.

2
New cards

What is the main goal of experimental design?

To collect reliable data efficiently while minimizing time, cost, bias, and mistakes,

3
New cards

In data science, what does experimental design help answer?

Which option or decision optimizes an objective or goal function (e.g., maximizing CTR)

4
New cards

Why is defining the research question important?

To make sure the experiment is properly planned and addresses what we need it to address

5
New cards

What should most data science problems ultimately aim to do?

Predict future outcomes or identify the most optimal solution

6
New cards

What are the two main types of variable in an experiment?

Independent and Dependent

7
New cards

Example of independent and dependent vars?

IV: Website’s button color DV: click through rate

8
New cards

What define a control group vs. a treatment group?

Control group gets no change while the treatment group gets the change

9
New cards

Why identify the population or sample?

It clarifies who we’re representing in the study

10
New cards

What is a hypothesis?

An educated guess about the relationship between variables that we can test

11
New cards

How do you test a hypothesis?

conduct experiments to see if your hypothesis was correct!

12
New cards

Sequence for stating a hypothesis?

If this happens, then this will happen

13
New cards

Which variable is manipulated and which is measured?

The IV is manipulated and the DV is measured

14
New cards

What is a confounder?

A 3rd party variable that influences the results of our experiment, thereby distorting the actual IV to DV relationship

15
New cards

Examples of confounders?

Prior knowledge in area, socioeconomic status, user demographics

16
New cards

Why are confounders typically considered bad for an experiment?

They cause biased or invalid results, leading us to incorrect conclusions

17
New cards

List the DESIGN STAGE strategies to handle confounders!

Randomization, Restriction, Matching, and Replication

18
New cards

List the ANALYSIS-STAGE strategies!

Replication and multivariable regression

19
New cards

What is multivariable regression?

A measure of how one outcome is affected by several factors at once

20
New cards

Restate this question using a multivariable regressive thought process!: “How does exercise affect weight?”

How do exercise, age, and diet together affect weight?

21
New cards

What happens during RANDOMIZATION?

We assign participants to groups randomly to balance confounding variables and minimize bias.

22
New cards

What is restriction?

Limiting the sample to one level of a confounder (e.g., only non-smokers) so it cannot vary

23
New cards

Disadvantage of restriction?

Reduces generalizability of results

24
New cards

Define MATCHING

pairing cases and controls with similar confounder values (e.g., same age & sex) to reduce bias

25
New cards

Limitations of matching?

Difficult when many confounders exist

26
New cards

Why replicate an experiment?

To confirm results and strengthen confidence in the results

27
New cards

In “study time → exam score” example, what confounder exists?

Prior knowledge

28
New cards

In “study time → exam score” example, how can prior knowledge be controlled using RANDOM SAMPLING (RCT)?

Randomly people to control/treatment groups

29
New cards

In “study time → exam score” example, how can prior knowledge be controlled using STRATIFIED RANDOMIZATION?

group people by their level of prior knowledge before then randomly assigning

30
New cards

In “study time → exam score” example, how can prior knowledge be controlled using BLOCK DESIGN (matched pairs)?

Pair people by their prev knowledge then randomly assign

31
New cards

What are the four key data-collection methods?

Observational studies, Surveys, Experiments, Simulations

32
New cards

Define an observational study.

A study in which you simply observe to see the results

33
New cards

List the 3 types of observational studies

Cross-section, Retrospective, Prospective

34
New cards

One line explanation of a CROSS-SECTIONAL study?

Snapshot at a specific event in time

35
New cards

One line explanation of a RETROSPECTIVE study?

Looking back at past data

36
New cards

Example of a RETROSPECTIVE study?

Case Control

37
New cards

What happens during case-control?

comparing people with a specific outcome (cases) to those without it (controls) to find factors linked to that outcome.

38
New cards

Example of a case-control? Say, you want to find out if smoking is linked to lung cancer. What would be the Case and what would be the Control (you’d be looking at their past to figure out how many have smoked before)?

Case: people who have lung cancer Control: People who don’t have lung cancer

39
New cards

One line explanation of a PROSPECTIVE study?

Following a group of people over time

40
New cards

Key design concern of surveys?

Bias due to wording of the questions

41
New cards

What differentiates EXPERIMENTS from OBSERVATIONAL  studies?

In experiments, we change the IV

42
New cards

What is the placebo effect?

symptoms improve because they thing they’re receiving treatment (even though it’s fake)

43
New cards

What is blinding used for?

To minimize bias by having subjects not know who is getting the real treatment

44
New cards

What is a single-blind?

One side is unaware (usually the participants)

45
New cards

What is a double-blind?

Both sides are unaware (participants and researchers)

46
New cards

What is the fundamental rule of data collection?

the data must actually represent the population we’re testing

47
New cards

Can we eliminate all bias from experiments?

No!

48
New cards

Best data‑collection method for studying effects of a severe earthquake?

Simulation

49
New cards

Best data‑collection method for testing a coupon’s influence on catalog purchase rates?

A/B test experiment

50
New cards

Best data‑collection method for studying if smoking affects heart disease?

Observational study (case-control)

51
New cards

Best data‑collection method for finding average household income in a city?

Survey