1/40
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Taking transformations of variables to create new predictor variables in a regression is known as
feature engineering
In the following line of R code, what is/are the predictor variable(s)?
fit <- glm(y ~ .-x1, data = myData)
all columns in myData other than x1 and y
why is -x1 excluded from the predictor variables?
fit <- glm(y ~ .-x1, data = myData)
"-" means to exclude
In the following line of R code, what is/are the response variable(s)?
fit <- glm(y ~ .-x1, data = myData)
y
In the following model, how is the coefficient on x interpreted?
fit <- glm ( log(y) ~ log(x), data = myData )
the percent change in y for each 1% increase in x (because a log-log model)
Logistical regression must...
have "family = binomial" to be logistical regression
Reference level for "make"?
R chooses the first make provided → in this case it was Acura
What does the coefficient mileage:year represent?
the interaction between mileage and year
Using the first model, what is the expected impact on the price for a car with the Wagon body style?
NEGATIVE number so a DECREASE compared to reference/baseline body style
What are the model degrees of freedom for the second model?
Number returned when using the coef function
Which assumption for linear regression is obviously violated for model 1?
constant variance = crazier dot group
Regardless of the model, there are 2 points on each plot that are far away from the other points. What do these indicate?
these cars are selling for a much higher amount than the model predicts
What is the value returned from the following line of code?
> predict(fit3,newdata,type="response")
the probability of defaulting on the loan
What is the value returned from the following line of code?
> predict(fit3,newdata)
1
0.066
the log(odds) or defaulting on the loan
The correlation of data with a lagged version of itself is called
Autocorrelation
Suppose you have a time series model and the lag1 coefficient is = 1.2. What type of series do you have?
Greater than 1 = diverging
In a time series, regular variation that is repeated within a year is called
seasonal variation
By making the response variable cnt>7000 , we are creating a model that will predict:
the probability of more than 7,000 rides in a day.
How does glm() know to run a logistic rather than a linear regression?
by specifying the family="binomial" argument
Which function will list all categories of weathersit?
levels()
Adding the lag variable to the model
accounts for the autocorrelation that is present
how did you determine the type of time series?
by looking at the value of the lag-1 coefficient in the glm() output
choose a log-log model when
variables move multiplicatively with each other
the predict() function in R takes the values to predict from
a data frame
What is the estimated value of sales for Dominick's Orange Juice priced at $2?
5728.8

Interactions allow you to model
whether one variable changes the effect of another variable on the response
Interpret the meaning of R^2
the % of total variation in the response variable that is due to the variation in the predictor variables
a residual observation is defined to be
the observed value of the response--the predicted value for the response
in a linear regression with 5 inputs, how many model degrees of freedom are there?
6
logistic regression is appropriate when
the response is binary
in logsitic regression we predict...
predicted probability
which expression represents the odds in a logistic regression model
B

which expression gives the predicted probability in a logistic regression model?
C

B

What is the value returned from the following line of code?
the probability of being spam for observation #2

log odd of success (y=1)

In time series modeling how can we quantify trend?
include a time index variable as a predictor variable in the model
in time series modeling, how can we quantify seasonal patterns?
add seasonal factor effects as predictor variables to the model
if autocorrelation is present, there is a violation of the basic linear regression assumption of
independence of observations
the autocorrelation function tracks
correlation of values in a series at different lags
if you have a random walk, you should...
perform the "returns" transformation