HD & DS Lecture 3 Regression II

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

What is regression a powerful tool for?

epidemiology

2
New cards

What are Directed Acrylic Graphs? (DAGs); what are these graphs made of?

simple too communicate casual relationships; nodes and edges

3
New cards

nodes

variables we are studying

4
New cards

edges

represent casual relationship between two variables

5
New cards

What CAN and CANNOT DAGs have

they can be made an arbritarially complex but cannot contain a cycle

6
New cards

why can’t DAGs contain a cycle?

You're often modeling something like cause-and-effect.one event can lead to another, but it can't eventually lead back to the first event, ensuring clarity in how data and processes flow.

7
New cards

how do we qualify strength of the casual relationship in DAGs?

regression and the weight coeffiecient

8
New cards

What is the difference between a regression and DAG?

DAGs help us visualize a simple relationship and a regression model quantifies it

9
New cards

what does the weight coefficient tell us?

whether the input is significantly associated with the outcome

10
New cards

main effect

casual relationship that is of primary interest

11
New cards

Covariates

variable that is not the main effect but may have an effect on the outcome (does the link between smoking and lung cancer still exist when we take gender into account?)

12
New cards

Mediators

intermediate variable that explain the process by which an exposure leads to an outcome (why and how an effect happens) ( if the impact of caloric intake (X) on weight (Y) happens through metabolism (M).)

13
New cards

confounders

third variable that affects both the main variable and the outcome, which can mess up the real connection between them. (if we’re studying whether owning a lighter is linked to lung cancer, smoking is a confounder.)

14
New cards

effect modification

a variable that modifies the causal relationship between the main effect and the outcome; the main effect has a different impact in different circumstances. (we might explore whether education’s impact on income varies by race.)

15
New cards

How does the regression model examine how off our predictions were?

bias (how far off our predictions were)

Variance (how much the model’s predictions change when we use different data sets)

Random error (natural unpredictability)

16
New cards

Reducing bias ____ variance

increases

17
New cards

high bias leads to

underfitting

18
New cards

underfitting

when the model looks smooth but is too simple to capture true patterns

19
New cards

high variance and low bias leads to

overfittin

20
New cards

overfitting

model is too complex fits the data to the outcome but is not tailored to the training set. (reading a book but not being able to summarize-works well on know examples but not new ones)

21
New cards

how do we handle overfitting and undercutting?

train-test technique

22
New cards

train-set split

splitting the into two subset data depending on size and requirements; one for training and saving the other subset for testing

23
New cards

example of train-test split

Mean Squared Error in Linear Regression

24
New cards

Limitation of train-test split

may not capture full complexity and diversity of data (especially if data set is small) therefore the model may not generalize well

25
New cards

how do we handle the limitation of train-split?

cross-validation

26
New cards

cross-validation

provides a more accurate description of how the model will perform on unseen data test-train splits

27
New cards

What are the benefits of cross-validation?

reduces variability

maximize data usage