Exam 1 Review Flashcards

0.0(0)

Studied by 0 people

View linked note

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/64

Earn XP

Description and Tags

51 practice Q&A flashcards based on the lecture notes for Exam 1 preparation.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

65 Terms

New cards

Is there a page limit for your Exam 1 review materials?

There is no page limit; however, it’s recommended to have one or two sheets and backup materials for quick reference.

New cards

What key pieces are omitted from the table of when to use what that you might add?

The percentage of variability explained (R-squared times 100) and the confidence interval for the slope.

New cards

How is the confidence interval for the slope calculated?

Margin of error = 1.96 times the standard error for the slope; the interval is slope ± margin of error.

New cards

When solving for change in y using the slope formula, what should you multiply?

Change in y = slope × change in x (given the change in x and the slope).

New cards

What is the general form of the simple linear regression equation in R terms?

y_hat = b0 + b1 x, estimated using lm(y ~ x, data = dataset).

New cards

Which calculator is allowed for Exam 1?

The Desmos calculator.

New cards

How do you use Desmos to compute an expression?

Type the expression in the input area, press Enter, and you can copy/paste previous calculations for convenience.

New cards

What is an example of a categorical variable from the notes?

Pet owner with possible values Yes and No.

New cards

What is an example of a quantitative variable from the notes?

Sales (amount in dollars).

New cards

Which is an example of descriptive statistics?

Displaying a graph of total Lego sets sold over the last four holiday seasons.

New cards

Which is an example of predictive statistics?

Using past data to predict future demand or stock.

New cards

What is Simple Linear Regression?

A regression model with a straight-line trend that uses exactly one X variable.

New cards

What is the goal of reducing a model?

Start with a complicated model and create a simpler model with about the same accuracy, often by removing variables one at a time (using p-values or AIC).

New cards

In a scatterplot matrix, what is its main purpose?

It is a collection of graphs used to visually explore relationships; it does not provide numerical calculations.

New cards

In the call center example, what is the dependent variable?

The customer satisfaction score (the Y variable to be predicted).

New cards

In the provided regression example, what is the intercept value?

The intercept is 597.

New cards

In the same example, what is the slope value?

The slope is 7.

New cards

What does 1stQU stand for in summary output?

First quartile (the 25th percentile).

New cards

Which R command creates a model predicting demand based on supply?

lm(demand ~ supply) (i.e., demand as response, supply as predictor).

New cards

In the model y_hat = b0 + b1 x, what is the dependent variable?

The dependent variable is y (the thing being predicted).

New cards

If a full model with many Xs is reduced by removing a variable, can you conclude the removed variable isn’t correlated with Y?

False; removal can be due to multicollinearity, not lack of correlation with Y.

New cards

If all X variables are strongly correlated with Y, does that imply high multicollinearity?

False; multicollinearity refers to correlations among X variables, not their correlation with Y.

New cards

In a model where sales depend on temperature and budget, what are the Y and X variables?

Y = sales; X variables include temperature (controlled) and budget (predictor).

New cards

When comparing a full model to a reduced model, which p-value informs whether you can drop a variable?

The p-value for the variable being removed; if greater than 0.05, removal does not hurt accuracy.

New cards

What is the interpretation of the slope for bedrooms in a rent model with distance controlled?

380 dollars increase in rent per additional bedroom, holding distance constant.

New cards

What is the interpretation of the slope for distance for a 0.5 mile increase in the rent model?

Rent decreases by 60 dollars when distance increases by 0.5 miles (slope -120 per mile; -120 × 0.5 = -60).

New cards

What does an R-squared of 0.929 mean in percent for the rent model?

92.9% of the variability in rent is explained by the model.

New cards

What is true about multicollinearity and model accuracy?

Multicollinearity affects interpretation but not necessarily the accuracy of the model's predictions.

New cards

If a regression model has a slope of 4 and SE for the slope is 1.5, what is the 95% confidence interval for the slope?

4 ± (1.96 × 1.5) = 4 ± 2.94 (i.e., (1.06, 6.94)).

New cards

In a two-variable regression model (x1 and x2), how do you compare the two models to see if removing x2 hurts accuracy?

Use the p-value for x2; if it is greater than 0.05, removing x2 does not meaningfully reduce accuracy.

New cards

In the two-question rent interpretation (rooms and distance), what must be included in the interpretation to be valid?

The interpretation must match the variables in the model exactly (bedrooms and distance with rent). The dependent variable is rent, not distance.

New cards

What is the general role of the intercept in a linear regression model?

The intercept (b0) is the predicted value of y when x = 0; it is a coefficient, not a standalone variable.

New cards

What does 'the ridge of pr > |t|' represent in the regression coefficients table?

It represents the p-value associated with each coefficient.

New cards

Why is it important that model variables exactly match the interpretation variables?

If they don’t, the interpretation can be incorrect or misleading.

New cards

What is the suggested preparation strategy for Exam 1 regarding notes and formulas?

Create 1-2 highly organized formula sheets and ensure notes are well structured for quick lookup during the timed exam.

New cards

What is the meaning of the term 'change in y' in slope-related calculations?

The amount by which the dependent variable y changes when the predictor x changes by a certain amount.

New cards

What does the hat symbol on a variable (e.g., y_hat) signify?

It denotes the predicted value of y.

New cards

What is the common significance threshold used to judge p-values in the notes?

0.05; p-values below 0.05 indicate statistical significance.

New cards

What is the difference between descriptive and predictive statistics in simple terms?

Descriptive uses full data to describe a system; predictive uses sample data to predict future outcomes.

New cards

What is the purpose of the scatterplot matrix in module one?

To visually explore relationships between pairs of variables; it is not used for numerical calculations.

New cards

What is the recommended practice when you encounter a model with missing variables in the interpretation?

You either need a model that matches the interpretation variables or you must be given values for the missing variables.

New cards

What is the role of R-squared in assessing variable strength in simple linear regression?

Although p-values assess significance, R-squared helps gauge the strength of the linear relationship.

New cards

What is the effect of holding distance constant when interpreting the effect of bedrooms on rent?

You isolate the effect of bedrooms on rent by controlling for distance (holding distance fixed).

New cards

What does 'not enough information' imply in the context of model interpretation questions?

The given model output does not provide enough information to answer the question about a specific variable’s relationship with Y.

New cards

What is the interpretation of a slope of -120 for distance in the rent model?

For each additional mile away, rent changes by -120 dollars, holding bedrooms constant.

New cards

What is the practical use of the 'margin of error' for the slope in the notes?

To form a 95% confidence interval for the slope and understand the precision of the estimate.

New cards

What does the term 'AIC' refer to in model reduction?

Akaike Information Criterion; used in stepwise reduction to balance simplicity and fit.

New cards

Why might you want to have one or two sheets of paper for the exam despite no page limit?

To keep essential formulas and decision rules readily accessible during a timed test.

New cards

What is a key difference between a descriptive statistic and a predictive statistic example?

Descriptive statistic: graph of past sales; Predictive statistic: using data to forecast future demand.

New cards

What does the term ‘independent variable’ refer to in these notes?

The X variable(s) used to predict the dependent variable.

New cards

What does the term ‘dependent variable’ refer to in these notes?

The Y variable that is predicted or explained by the model.

New cards

In regression output, where is the intercept value typically found?

In the coefficients table, in the row labeled intercept.

New cards

What should you do if the print ad budget is not provided in a given model interpretation?

You do not have enough information to compute the predicted outcome; need a model including that variable or a value for it.

New cards

What is the primary effect of multicollinearity on interpretation vs. prediction?

It can distort interpretation of coefficients, but does not necessarily reduce predictive accuracy.

New cards

What is the recommended practice for solving for change in y in a regression context?

Use the equation change in y = slope × change in x; or equivalently, change in y = change in x × slope.

New cards

What is the recommended use of Desmos in Exam 1?

Desmos is an allowed calculator for calculations during the practice/learning parts of the exam.

New cards

What is the meaning of a p-value greater than 0.05 for the slope in question 17?

It indicates there is not a statistically significant correlation between x and y at the 5% level.

New cards

How do you determine the strongest single predictor among X1..X4 using the notes?

Construct four simple linear regressions of Y on each Xi and compare their R-squared values; the highest R-squared indicates the strongest correlation.

New cards

What is the general form of how to express the predicted value in linear regression?

y_hat = b0 + b1 x (for simple regression); in multiple regression it includes additional terms for more Xs.

New cards

What does a 'not enough information' answer imply about model variables?

The given information does not allow determining the effect of a particular variable due to model mismatch or missing data.

New cards

What is the interpretation of holding a variable constant in regression analysis?

It means that the variable is controlled for so its effect can be isolated when assessing another predictor.

New cards

What is the significance of the low p-value for a slope in determining correlation?

A low p-value (<0.05) suggests a statistically significant relationship between X and Y.