HD & DS Lecture 3 Regression II

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/26

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

27 Terms

New cards

What is regression a powerful tool for?

epidemiology

New cards

What are Directed Acrylic Graphs? (DAGs); what are these graphs made of?

simple too communicate casual relationships; nodes and edges

New cards

nodes

variables we are studying

New cards

edges

represent casual relationship between two variables

New cards

What CAN and CANNOT DAGs have

they can be made an arbritarially complex but cannot contain a cycle

New cards

why can’t DAGs contain a cycle?

You're often modeling something like cause-and-effect.one event can lead to another, but it can't eventually lead back to the first event, ensuring clarity in how data and processes flow.

New cards

how do we qualify strength of the casual relationship in DAGs?

regression and the weight coeffiecient

New cards

What is the difference between a regression and DAG?

DAGs help us visualize a simple relationship and a regression model quantifies it

New cards

what does the weight coefficient tell us?

whether the input is significantly associated with the outcome

New cards

main effect

casual relationship that is of primary interest

New cards

Covariates

variable that is not the main effect but may have an effect on the outcome (does the link between smoking and lung cancer still exist when we take gender into account?)

New cards

Mediators

intermediate variable that explain the process by which an exposure leads to an outcome (why and how an effect happens) ( if the impact of caloric intake (X) on weight (Y) happens through metabolism (M).)

New cards

confounders

third variable that affects both the main variable and the outcome, which can mess up the real connection between them. (if we’re studying whether owning a lighter is linked to lung cancer, smoking is a confounder.)

New cards

effect modification

a variable that modifies the causal relationship between the main effect and the outcome; the main effect has a different impact in different circumstances. (we might explore whether education’s impact on income varies by race.)

New cards

How does the regression model examine how off our predictions were?

bias (how far off our predictions were)

Variance (how much the model’s predictions change when we use different data sets)

Random error (natural unpredictability)

New cards

Reducing bias ____ variance

increases

New cards

high bias leads to

underfitting

New cards

underfitting

when the model looks smooth but is too simple to capture true patterns

New cards

high variance and low bias leads to

overfittin

New cards

overfitting

model is too complex fits the data to the outcome but is not tailored to the training set. (reading a book but not being able to summarize-works well on know examples but not new ones)

New cards

how do we handle overfitting and undercutting?

train-test technique

New cards

train-set split

splitting the into two subset data depending on size and requirements; one for training and saving the other subset for testing

New cards

example of train-test split

Mean Squared Error in Linear Regression

New cards

Limitation of train-test split

may not capture full complexity and diversity of data (especially if data set is small) therefore the model may not generalize well

New cards

how do we handle the limitation of train-split?

cross-validation

New cards

cross-validation

provides a more accurate description of how the model will perform on unseen data test-train splits

New cards

What are the benefits of cross-validation?

reduces variability

maximize data usage