Stats 2 multiple choice questions

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/54

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

55 Terms

New cards

A variable used to represent a categorical variable in regression by coding it numerically (usually 0 or 1)

example:

Gender Male = 1, Female = 0

In a regression predicting salary

Salary = 30,000 + 5,000 (Gender)

Being male adds $5,000 to the salary compared to being female

a. Dummy variable

b. Collinearity Variable

c. Interaction Variable

d. Variance inflation factor

e. Parsimony variable

a. Dummy variable

New cards

Not an actual type of variable, this usually refers to the problem of multicollinearity when two or more predictors in a regression are highly correlated

Example

X1 = years of experience
X2 - age
high correlation between X1 and X2 = multicollinearity

a. Dummy variable

b. Collinearity Variable

c. Interaction Variable

d. Variance inflation factor

e. Parsimony variable

b. Collinearity Variable

New cards

A variable that captures how the effect of one predictor depends on another. Often created by multiplying two predictors together

Example:

Education x Gender
Regression

Salary = 20,000 +2,000(education) + 5,000(Gender) + 1,500(education x gender)

The effect of education on salary is different for men and women

a. Dummy variable

b. Collinearity Variable

c. Interaction Variable

d. Variance inflation factor

e. Parsimony variable

c. Interaction Variable

New cards

A measure used to detect multicollinearity in regression. It quantifies how much the variance of a regression coefficient is inflated due to correlation with other predictors

example:

If VIF for X1 = 10, then the coefficient estimate for X1 is unstable because X1 is highly correlated with other predictors

a. Dummy variable

b. Collinearity Variable

c. Interaction Variable

d. Variance inflation factor

e. Parsimony variable

d. Variance inflation factor

New cards

A variable that keeps the mode simple and efficient, avoiding unnecessary complexity. This relates to the principle of parsimony (Occam’s Razor) in model building

Example:

Instead of including dozens of tiny predictors for predicting test scores, you might just include study hours and class attendance because they explain most of the variance

a. Dummy variable

b. Collinearity Variable

c. Interaction Variable

d. Variance inflation factor

e. Parsimony variable

New cards

Whats the error degrees of freedom formula

n - (k+1)

New cards

What is the response variable

the outcome measured, observed or recorded in a study or experiment

Example from the gold ball study: travel distance of the gold ball in yards

New cards

What is the factor

is the variable that the researcher manipulates, controls, or selects to determine its effects on the response variable.

often has different Levels

example from golf ball study: Type of golf ball (Type A, Type B, Type C)

example: storage temperature (0 degrees, 10 degrees, 20 degrees)

New cards

what is the pooled variance estimate formula

New cards

Whats the SE formula

New cards

Whats the ME formula

New cards

Confidence interval formula with variance

New cards

Decreased sales due to a fire at a meat packing plant is an example of a(n) ______ component

a. trend

b. cyclical

c. seasonal

d. irregular

Why:

irregular = unexpected, unpredicted events that disrupt normal patterns

New cards

A company has recorded its annual online sales for the past 10 years and notices that sales increase every single year as more customers shift to online shopping and the company expands its digital presence. This long-term upward movement represents which component of a time series

a. trend

b. cyclical

c. seasonal

d. irregular

a. trend

why:

Trend = long-term increase or decrease over many years

New cards

An economist notices that employment levels trend to rise and fall approximately every 5-7 years, following expansions and contractions in the national economy. These multi-year fluctuations represent which component of a time series?

a. trend

b. cyclical

c. seasonal

d. irregular

b. cyclical

Why:

cyclical = multi-year up-and-down movements tied to the economic cycle

New cards

A retail store observes that sales spike every November and December due to holiday shopping and drop every January. These predictable, repeating patterns occur every year. What type of time series component does this represent?

a. trend

b. cyclical

c. seasonal

d. irregular

c. seasonal

Why:

Seasonal = regular patterns that repeat every year at the same time (monthly, quarterly, holiday-related)

New cards

In exponentially smoothed time series, the smoothing constant w is chosen on the basis of how much smoothing is required. In general, which of the following statements is true?

a. A small value of w, such as w = 0.1, results in very little smoothing, while a large value, such as w = 0.9, may result in excessive smoothing

b. A small value of w, such as w = 0.1, results in excessive smoothing, while a large value, such as w = 0.9, results in very little smoothing

c. A small value of w, such as w = 0.1, and a large value, such as w = 0.9 may both result in very little smoothing

d. A small value of w, such as w = 0.1, and a large value, such as w = 0.9 may both result in excessive smoothing

b. A small value of w, such as w = 0.1, results in excessive smoothing, while a large value, such as w = 0.9, results in very little smoothing

New cards

Suppose the average loan size for the 229 loans is $25,000 with a standard deviation of $3,000. Calculate the 95% confidence interval estimate of the population mean loan size for all borrowers.

95% confidence interval = 1.96

answer: (24,875.46, 25,124.54)

New cards

A quadratic equation fitted to annual time series data is, where t = 1 for 1997, t = 2 fro 1998, etc. The forecasted value for 2004 is:

a. 3.25

b.10.75

c. 28.0

d. 229.5

New cards

<p>The following is the list of mean absolute deviation (MAD) statistics for each of the four models that have been estimated for a particular time series</p><p>Based on the MAD criterion, the most appropriate model is </p><p>A. linear trend </p><p>B. quadratic trend </p><p>C. exponential trend </p><p>D. autoregressive </p>

The following is the list of mean absolute deviation (MAD) statistics for each of the four models that have been estimated for a particular time series

Based on the MAD criterion, the most appropriate model is

A. linear trend

B. quadratic trend

C. exponential trend

D. autoregressive

why: it yields lowest average absolute forecast error

New cards

If X and Y in a regression model are perfectly related,

a. The Pearson product-moment correlation would be 0

b. the SSR would be 0

c. the coefficient of determination would be 0

d. the standard error of the estimate would be 0

d. The standard error of the estimate would be 0

Why: because all points in the graph can be connected by one smooth function

New cards

The least squares criterion for estimating regression coefficients is to

a. minimize the error sum of squares

b. maximize the p-value

c. maximize the standard error of the estimate

d. minimize the SST

a. minimize the error sum of squares

New cards

<p>The following regression results are for a model to predict monthly household expenditure on groceries in ($’s) using annual household income in $1,000’s and household neighborhood (0 = suburban, 1 = rural) as predictors </p><p>For a rural household with $70,000 annual income, the model predicts monthly grocery expenditures of ____</p>

The following regression results are for a model to predict monthly household expenditure on groceries in ($’s) using annual household income in $1,000’s and household neighborhood (0 = suburban, 1 = rural) as predictors

For a rural household with $70,000 annual income, the model predicts monthly grocery expenditures of ____

190.28

New cards

In a multiple regression analysis, the correlation among the independent variables is termed

a. heteroscedasticity

b. parsimony

c. linearity

d. collinearity

d. adjusted coefficient of determination

d. collinearity

New cards

What do we mean when we say that a simple linear regression model is “statistically” significant?

a. All of the statistics computed from the sample make sense

b. the model is an excellent predictor of y

c. the model is “practically” useful for predicting y

d. the mode is a better predictor of y than the sample average, y-hat

d. the mode is a better predictor of y than the sample average , y-hat

New cards

If a simple linear regression line has a y-intercept of 10 and a slope of 4, and the residual is 1 when x = 2, the actual value of y is:

a. 14

b. 15

c. 18

d. 19

New cards

In a multiple regression analysis involving 25 data points, the SSE is 36 and the MSE is 1.8. In this case, the number of the independent variables must be:

a. 6

b. 5

c. 4

d. 3

c. 4

New cards

In an attempt to determine the factors that affect the amount of energy used, 200 households were analyzed. In each, the number of occupants and the amount of electricity used were measured. The response variable in a simple regression analysis of this data would be.

a. the number of occupants per household

b. the number of households

c. the amount of electricity used

d. either the number of occupants per household or the amount of electricity used

e. none of the other choices is correct

c. the amount of electricity used

why: because we want to predict the amount of energy used, therefore response is the amount of electricity used

New cards

F =

New cards

MSR formula =

SSR / K

New cards

MSE formula =

SSE / n - k - 1

New cards

SSE formula

SST - SSR

New cards

In a regression model involving a total of 30 observations, the following estimated regression equation was obtained: Y-hat = 170 + 34X₁- 3X₂+ 9X₃+ 58X₄+ 3X_5.For this model, SSR = 1,760 and SST = 2,000. What is the observed F-value? report 2 decimal places.

35.20

New cards

Confidence interval formula with standard deviation

New cards

<p>Simple Linear Regression analyses were run for 3 scenarios. Some one-variable summary information and the standard error of the regression model for each scenario are provided in the table below. Which regression provides the best “goodness-of-fit” (the most improvement in predictability compared to a one-sample analysis)? </p><p></p><p>a. nondefects v score </p><p>b. debt v hours</p><p>c. bonus v service years </p><p>d. connot be determined without R<sup>2</sup></p>

Simple Linear Regression analyses were run for 3 scenarios. Some one-variable summary information and the standard error of the regression model for each scenario are provided in the table below. Which regression provides the best “goodness-of-fit” (the most improvement in predictability compared to a one-sample analysis)?

a. nondefects v score

b. debt v hours

c. bonus v service years

d. connot be determined without R²

c. bonus v service years

New cards

Answer A for true and B for false. A regression analysis between sales (in $1,000’s) and advertising (in 100’s ) resulted in the following least squares line: y-hat = 77 + 8x. This implies that if the current advertising budget is doubled, sales are predicted to increase by $16,000

False

we don’t know the original budget

New cards

If two variables, x and y, have a strong linear relationship, then

a. x causes y to happen

b. y causes x to happen

c. answers a and b are both correct

d. there may or may not be any causal relationship between x and y

New cards

Simple linear model

it’s positive

<ul><li><p>it’s positive </p></li></ul><p></p>

New cards

to determine if the regression model is statistically significant, the appropriate null and alternative hypotheses are:

New cards

for the test of the significance of the regression model, the value of the test statistic is _____ report 4 decimal places

35.7227

its the F-ratio

New cards

Write a hypotheses for the test of the significance of the predictor size of the home in the model

H₀: Bsize = 0 H_A: B size ≠ 0

New cards

For the test of significance of the predictor size of home, the best conclusion is

a. reject H₀

b. do not reject H₀

c. There is strong evidence that the size of the home is significant in predicting the selling price in this model

d. There is strong evidence that the size of the home is not significant in predicting

c. There is strong evidence that the size of the home is significant in predicting the selling price in this model

significant = reject H₀ not significant = Do not reject H₀

The H-size p-value is <0.0001

New cards

The 95% confidence interval estimate of the population coefficient on house size:

a. will capture 0

b. given not capture 0 and will be positive

c. will not capture 0 and will be negative

d. we cannot tell anything about the interval without knowing SE (b₁)

b. given not capture 0 and will be positive

why: H size of coefficient = 63.34… which is positive

and p-value < 0.0001 reject H₀ therefore unlikely to capture 0

New cards

The error degrees of freedom in the regression model are

n = 100

k = 4

n - (k+1) = 100 - (4 +1) = 95

New cards

Write the simplified (specific) equation to predict the selling price of a two story home

y-hat = 27484.35918 + 63.34106611(H size) + 20341.3425 + b₃(0) + b₄ (0) - then simplify

y-hat = B₀ + b₁(H size) + b₂ (two story) + b₃(side split) +b₄(back split)

B₀= constant = 27484.35918

New cards

What conclusion can you make based on the given information?

a. There are no significant differences in the mean GMAT scores for the three graduate schools

b. The mean GMAT score is highest for school 1, second highest for school 2, and lowest for school 3

c. The only significant difference in mean scores is between schools 1 and 3, with the mean GMAT score higher for school 1

d. The only significant differences in mean scores is between schools 1 and 3, with the mean GMAT score higher for school 3

e. there is no significant difference in the mean GMAT scores between schools 1 and 3, but the mean GMAT score for school 1 is higher than for school2 and the mean GMAt score is higher for school 2 than for school 3

c. The only significant difference in mean scores is between schools 1 and 3, with the mean GMAT score higher for school 1

School 1 - School 3 is significant because it does not capture 0 from 21.19 - 364.52, then 171.67 is positive so GMAT score is higher for school 1 than 3

New cards

If a hypothesis has a significance level of 0.05, this means that

a. when the null hypothesis is true, we don’t reject it 5% of the time

b. When the null hypothesis is false, we don’t reject it 5% of the time

c. When the null hypothesis is false, we reject it 5% of the time

d. When the null hypothesis is true, we reject it 5% of the time

e. none of the above answers

c. When the null hypothesis is false, we reject it 5% of the time

New cards

If an observation of X has the value 55, which of the following is most reasonable?

a. The corresponding value of Y is 90

b. The corresponding value of Y is 110

C. We do not have enough information to know anything about the value of Y

b. The corresponding value of Y is 110

There is a strong correlation with r_xy= 0.95 so if X has a value of 55 above the mean of 50 then Y will also most likely have a value above the mean of 100 so the best answer is 110

New cards

observational study definition

An observational study is a type of research in which the investigator observes individuals or phenomena without manipulating any variables. The researcher simply records data as it naturally occurs.

New cards

experimental

The opposite of an observational study is an experimental study (also called an experiment).
In an experimental study, the researcher actively manipulates one or more variables—typically assigning subjects to different treatment groups—to observe the effect of that manipulation

New cards

The response variable is __

a. type of golf ball

b. golfer

c. golf ball manufacturer

d. Travel distance of the gold ball in yards

d. travel distance of the gold ball in yards

Why: think response variable is connected with the goal of the study, the goal of the study is to see which gold ball travels the farthest, therefore its d

New cards

Based on the ANOVA results, you should

a. reject the null hypothesis, conclude that the three types of gold balls travel the same distance on average, and stop

b. do not reject the null hypothesis, and conclude that the three types of golf

c. reject the null hypothesis, conclude the three types of golf balls do not travel the same distance on average, and conduct a post hoc analysis

d. not reject the null hypothesis, conclude that the three types of golf balls do not travel the same distance on average, and conduct a post hoc analysis

c. reject the null hypothesis, conclude the three types of golf balls do not travel the same distance on average, and conduct a post hoc analysis

Why: p-p-value is <0.0001

New cards

Suppose the P-value for the test is 0.035, IT IS NOT, but suppose that it is. Which of the following is the best interpretation of that value?

a. The probability that the population mean is 50 is 0.035

b. The probability that the population mean is not 50 is 0.035

c. There is a 3.5% chance that a random sample would result in an average as different from 50 as the observed average is if the population mean is really 50

d. There is a 3.5% chance that the sample average is biased

c. There is a 3.5% chance that a random sample would result in an average as different from 50 as the observed average is if the population mean is really 50

New cards

What is the value of the pooled variance estimate?

3.4375

New cards

Calculate the 99% confidence interval estimate in population means

(-7.57, -2.43)