data science final

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/45

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

46 Terms

1
New cards

What does b1 represent in simple linear regression?,

The expected change in Y for a one-unit increase in X.

2
New cards

Why is extrapolation a problem in regression?

,Predictions made outside the observed range of X are unreliable.

3
New cards

Why does correlation not imply causation?,

Regression captures association not causality unless proper design/controls exist.

4
New cards

What does b0 represent in simple regression?,

The predicted value of Y when X equals zero.

5
New cards

Why is b0 sometimes meaningless?

X may never equal zero in the data so the intercept may have no real interpretation.

6
New cards

How do you make a prediction in simple linear regression?,

Use Y-hat = b0 + b1*X.

7
New cards

How do you test if X and Y have a significant linear relationship?,

Check the p-value for b1; if small, the relationship is significant.

8
New cards

How do you interpret a coefficient in multiple regression?,It is the expected change in Y from a one-unit increase in that predictor holding all others constant.

9
New cards

What are the units of a coefficient?,Units of Y divided by units of X.

10
New cards

How does adding or removing variables change coefficients?,It can change magnitude or sign due to omitted variable bias or controlling for confounders.

11
New cards

How do you make predictions from a multiple regression summary table?,Plug values into Y-hat = b0 + b1X1 + b2X2 + …

12
New cards

What is a parallel slopes model?,A model where categories have different intercepts but the same slope.

13
New cards

What does LOCATIONDowntown L.A. represent?,A dummy variable equal to 1 if the observation is in Downtown L.A., otherwise 0.

14
New cards

How do you identify the baseline category?,It is the category not shown in the regression output.

15
New cards

How do you interpret a categorical coefficient?,It is the expected difference from the baseline category.

16
New cards

What does an interaction term allow?,It allows different slopes depending on the level of another variable.

17
New cards

Why include an interaction between a numerical and categorical variable?,Because the effect of X on Y may differ across groups.

18
New cards

What does X:Category mean?,It is an interaction term between X and that category.

19
New cards

What is a numerical-numerical interaction?,When the effect of one predictor depends on the value of another numerical variable.

20
New cards

How do you make predictions from an interaction model?,Include all pieces: intercept, main effects, and interaction term.

21
New cards

What is the population regression model?,Y = beta0 + beta1X + error.

22
New cards

What is the estimated regression line?,Y-hat = b0 + b1X.

23
New cards

What is the difference between bj and betaj?,bj is the sample estimate while betaj is the true unknown parameter.

24
New cards

What does a p-value test for in regression?,Whether the coefficient is significantly different from zero.

25
New cards

What is BIC used for?,Model selection balancing fit and complexity.

26
New cards

What is variable selection?,Choosing which predictors to include to avoid overfitting.

27
New cards

What should ideal residuals look like?,Centered at zero with constant variance and no pattern.

28
New cards

How do you fix nonconstant variance?,Use a log transform of Y.

29
New cards

How do you fix nonlinear patterns in residuals?,Use polynomial terms like X-squared.

30
New cards

How do you check if a quadratic term is needed?,Look for curvature or test the p-value for X-squared.

31
New cards

What is leverage?,A point with an unusual X value.

32
New cards

What is an outlier?,A point with an unusual Y value.

33
New cards

What are studentized residuals used for?,Identifying outliers; large absolute values indicate potential issues.

34
New cards

What is overfitting?,When a model fits noise and performs poorly on new data.

35
New cards

What is the difference between training and test data?,Training fits the model; test evaluates generalization.

36
New cards

How do training and test error typically compare?,Test error is usually higher.

37
New cards

What is cross-validation?,Repeatedly splitting data into training and test to estimate test error.

38
New cards

What is MSE?,The mean squared error, average squared prediction error.

39
New cards

Why not use linear regression for classification?,Predictions can fall outside 0–1 and assumptions fail.

40
New cards

What are odds?,Probability of success divided by probability of failure.

41
New cards

How do you interpret a logistic regression coefficient?,A one-unit increase in X multiplies the odds by e^coefficient.

42
New cards

How do you predict probability in logistic regression?,Use p = exp(η) / (1 + exp(η)).

43
New cards

How do you predict odds in logistic regression?,Odds = exp(η).

44
New cards

How do odds change when X increases by 1?,Multiply the odds by e^coefficient.

45
New cards

What does

46
New cards