HD & DS Lecture 3 Regression II

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 26

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

27 Terms

1

What is regression a powerful tool for?

epidemiology

New cards
2

What are Directed Acrylic Graphs? (DAGs); what are these graphs made of?

simple too communicate casual relationships; nodes and edges

New cards
3

nodes

variables we are studying

New cards
4

edges

represent casual relationship between two variables

New cards
5

What CAN and CANNOT DAGs have

they can be made an arbritarially complex but cannot contain a cycle

New cards
6

why can’t DAGs contain a cycle?

You're often modeling something like cause-and-effect.one event can lead to another, but it can't eventually lead back to the first event, ensuring clarity in how data and processes flow.

New cards
7

how do we qualify strength of the casual relationship in DAGs?

regression and the weight coeffiecient

New cards
8

What is the difference between a regression and DAG?

DAGs help us visualize a simple relationship and a regression model quantifies it

New cards
9

what does the weight coefficient tell us?

whether the input is significantly associated with the outcome

New cards
10

main effect

casual relationship that is of primary interest

New cards
11

Covariates

variable that is not the main effect but may have an effect on the outcome (does the link between smoking and lung cancer still exist when we take gender into account?)

New cards
12

Mediators

intermediate variable that explain the process by which an exposure leads to an outcome (why and how an effect happens) ( if the impact of caloric intake (X) on weight (Y) happens through metabolism (M).)

New cards
13

confounders

third variable that affects both the main variable and the outcome, which can mess up the real connection between them. (if we’re studying whether owning a lighter is linked to lung cancer, smoking is a confounder.)

New cards
14

effect modification

a variable that modifies the causal relationship between the main effect and the outcome; the main effect has a different impact in different circumstances. (we might explore whether education’s impact on income varies by race.)

New cards
15

How does the regression model examine how off our predictions were?

bias (how far off our predictions were)

Variance (how much the model’s predictions change when we use different data sets)

Random error (natural unpredictability)

New cards
16

Reducing bias ____ variance

increases

New cards
17

high bias leads to

underfitting

New cards
18

underfitting

when the model looks smooth but is too simple to capture true patterns

New cards
19

high variance and low bias leads to

overfittin

New cards
20

overfitting

model is too complex fits the data to the outcome but is not tailored to the training set. (reading a book but not being able to summarize-works well on know examples but not new ones)

New cards
21

how do we handle overfitting and undercutting?

train-test technique

New cards
22

train-set split

splitting the into two subset data depending on size and requirements; one for training and saving the other subset for testing

New cards
23

example of train-test split

Mean Squared Error in Linear Regression

New cards
24

Limitation of train-test split

may not capture full complexity and diversity of data (especially if data set is small) therefore the model may not generalize well

New cards
25

how do we handle the limitation of train-split?

cross-validation

New cards
26

cross-validation

provides a more accurate description of how the model will perform on unseen data test-train splits

New cards
27

What are the benefits of cross-validation?

reduces variability

maximize data usage

New cards
robot