1/64
51 practice Q&A flashcards based on the lecture notes for Exam 1 preparation.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Is there a page limit for your Exam 1 review materials?
There is no page limit; however, it’s recommended to have one or two sheets and backup materials for quick reference.
What key pieces are omitted from the table of when to use what that you might add?
The percentage of variability explained (R-squared times 100) and the confidence interval for the slope.
How is the confidence interval for the slope calculated?
Margin of error = 1.96 times the standard error for the slope; the interval is slope ± margin of error.
When solving for change in y using the slope formula, what should you multiply?
Change in y = slope × change in x (given the change in x and the slope).
What is the general form of the simple linear regression equation in R terms?
y_hat = b0 + b1 x, estimated using lm(y ~ x, data = dataset).
Which calculator is allowed for Exam 1?
The Desmos calculator.
How do you use Desmos to compute an expression?
Type the expression in the input area, press Enter, and you can copy/paste previous calculations for convenience.
What is an example of a categorical variable from the notes?
Pet owner with possible values Yes and No.
What is an example of a quantitative variable from the notes?
Sales (amount in dollars).
Which is an example of descriptive statistics?
Displaying a graph of total Lego sets sold over the last four holiday seasons.
Which is an example of predictive statistics?
Using past data to predict future demand or stock.
What is Simple Linear Regression?
A regression model with a straight-line trend that uses exactly one X variable.
What is the goal of reducing a model?
Start with a complicated model and create a simpler model with about the same accuracy, often by removing variables one at a time (using p-values or AIC).
In a scatterplot matrix, what is its main purpose?
It is a collection of graphs used to visually explore relationships; it does not provide numerical calculations.
In the call center example, what is the dependent variable?
The customer satisfaction score (the Y variable to be predicted).
In the provided regression example, what is the intercept value?
The intercept is 597.
In the same example, what is the slope value?
The slope is 7.
What does 1stQU stand for in summary output?
First quartile (the 25th percentile).
Which R command creates a model predicting demand based on supply?
lm(demand ~ supply) (i.e., demand as response, supply as predictor).
In the model y_hat = b0 + b1 x, what is the dependent variable?
The dependent variable is y (the thing being predicted).
If a full model with many Xs is reduced by removing a variable, can you conclude the removed variable isn’t correlated with Y?
False; removal can be due to multicollinearity, not lack of correlation with Y.
If all X variables are strongly correlated with Y, does that imply high multicollinearity?
False; multicollinearity refers to correlations among X variables, not their correlation with Y.
In a model where sales depend on temperature and budget, what are the Y and X variables?
Y = sales; X variables include temperature (controlled) and budget (predictor).
When comparing a full model to a reduced model, which p-value informs whether you can drop a variable?
The p-value for the variable being removed; if greater than 0.05, removal does not hurt accuracy.
What is the interpretation of the slope for bedrooms in a rent model with distance controlled?
380 dollars increase in rent per additional bedroom, holding distance constant.
What is the interpretation of the slope for distance for a 0.5 mile increase in the rent model?
Rent decreases by 60 dollars when distance increases by 0.5 miles (slope -120 per mile; -120 × 0.5 = -60).
What does an R-squared of 0.929 mean in percent for the rent model?
92.9% of the variability in rent is explained by the model.
What is true about multicollinearity and model accuracy?
Multicollinearity affects interpretation but not necessarily the accuracy of the model's predictions.
If a regression model has a slope of 4 and SE for the slope is 1.5, what is the 95% confidence interval for the slope?
4 ± (1.96 × 1.5) = 4 ± 2.94 (i.e., (1.06, 6.94)).
In a two-variable regression model (x1 and x2), how do you compare the two models to see if removing x2 hurts accuracy?
Use the p-value for x2; if it is greater than 0.05, removing x2 does not meaningfully reduce accuracy.
In the two-question rent interpretation (rooms and distance), what must be included in the interpretation to be valid?
The interpretation must match the variables in the model exactly (bedrooms and distance with rent). The dependent variable is rent, not distance.
What is the general role of the intercept in a linear regression model?
The intercept (b0) is the predicted value of y when x = 0; it is a coefficient, not a standalone variable.
What does 'the ridge of pr > |t|' represent in the regression coefficients table?
It represents the p-value associated with each coefficient.
Why is it important that model variables exactly match the interpretation variables?
If they don’t, the interpretation can be incorrect or misleading.
What is the suggested preparation strategy for Exam 1 regarding notes and formulas?
Create 1-2 highly organized formula sheets and ensure notes are well structured for quick lookup during the timed exam.
What is the meaning of the term 'change in y' in slope-related calculations?
The amount by which the dependent variable y changes when the predictor x changes by a certain amount.
What does the hat symbol on a variable (e.g., y_hat) signify?
It denotes the predicted value of y.
What is the common significance threshold used to judge p-values in the notes?
0.05; p-values below 0.05 indicate statistical significance.
What is the difference between descriptive and predictive statistics in simple terms?
Descriptive uses full data to describe a system; predictive uses sample data to predict future outcomes.
What is the purpose of the scatterplot matrix in module one?
To visually explore relationships between pairs of variables; it is not used for numerical calculations.
What is the recommended practice when you encounter a model with missing variables in the interpretation?
You either need a model that matches the interpretation variables or you must be given values for the missing variables.
What is the role of R-squared in assessing variable strength in simple linear regression?
Although p-values assess significance, R-squared helps gauge the strength of the linear relationship.
What is the effect of holding distance constant when interpreting the effect of bedrooms on rent?
You isolate the effect of bedrooms on rent by controlling for distance (holding distance fixed).
What does 'not enough information' imply in the context of model interpretation questions?
The given model output does not provide enough information to answer the question about a specific variable’s relationship with Y.
What is the interpretation of a slope of -120 for distance in the rent model?
For each additional mile away, rent changes by -120 dollars, holding bedrooms constant.
What is the practical use of the 'margin of error' for the slope in the notes?
To form a 95% confidence interval for the slope and understand the precision of the estimate.
What does the term 'AIC' refer to in model reduction?
Akaike Information Criterion; used in stepwise reduction to balance simplicity and fit.
Why might you want to have one or two sheets of paper for the exam despite no page limit?
To keep essential formulas and decision rules readily accessible during a timed test.
What is a key difference between a descriptive statistic and a predictive statistic example?
Descriptive statistic: graph of past sales; Predictive statistic: using data to forecast future demand.
What does the term ‘independent variable’ refer to in these notes?
The X variable(s) used to predict the dependent variable.
What does the term ‘dependent variable’ refer to in these notes?
The Y variable that is predicted or explained by the model.
In regression output, where is the intercept value typically found?
In the coefficients table, in the row labeled intercept.
What should you do if the print ad budget is not provided in a given model interpretation?
You do not have enough information to compute the predicted outcome; need a model including that variable or a value for it.
What is the primary effect of multicollinearity on interpretation vs. prediction?
It can distort interpretation of coefficients, but does not necessarily reduce predictive accuracy.
What is the recommended practice for solving for change in y in a regression context?
Use the equation change in y = slope × change in x; or equivalently, change in y = change in x × slope.
What is the recommended use of Desmos in Exam 1?
Desmos is an allowed calculator for calculations during the practice/learning parts of the exam.
What is the meaning of a p-value greater than 0.05 for the slope in question 17?
It indicates there is not a statistically significant correlation between x and y at the 5% level.
How do you determine the strongest single predictor among X1..X4 using the notes?
Construct four simple linear regressions of Y on each Xi and compare their R-squared values; the highest R-squared indicates the strongest correlation.
What is the general form of how to express the predicted value in linear regression?
y_hat = b0 + b1 x (for simple regression); in multiple regression it includes additional terms for more Xs.
What does a 'not enough information' answer imply about model variables?
The given information does not allow determining the effect of a particular variable due to model mismatch or missing data.
What is the interpretation of holding a variable constant in regression analysis?
It means that the variable is controlled for so its effect can be isolated when assessing another predictor.
What is the significance of the low p-value for a slope in determining correlation?
A low p-value (<0.05) suggests a statistically significant relationship between X and Y.
What is the relationship between the table of when to use what and formula sheets for Exam 1?
Enhancing this table with key equations (e.g., margins, confidence intervals) helps you apply formulas quickly.
Why is it important to include both Y and X variables exactly as described in the interpretation?
To ensure the interpretation matches the model and yields correct conclusions.
What is the effect of increasing the online ad budget by $2,000 when the model is 2,000 + 1.6×online + 1.3×print with both provided?
You would compute 2,000 + 1.6×2,000 + 1.3×1,000 to obtain the predicted units sold (6,500 in the example).