Simple Models: Key Vocabulary

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/43

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering fundamental terms and concepts from the lecture on simple models, parameter estimates, and statistical inference.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

DATA = MODEL + ERROR

Fundamental statistical identity stating that each observation equals the model’s prediction plus residual error.

2
New cards

Observation (Yi)

The actual recorded value for case i that we want to model or predict.

3
New cards

Predicted Value (Ŷi)

The value the model estimates for observation Yi based on fitted parameters.

4
New cards

Error / Residual (ei)

The difference between an observation and its predicted value: ei = Yi – Ŷi.

5
New cards

Sum of Squared Errors (SSE)

The total of squared residuals; measures overall model inaccuracy and is minimized in regression.

6
New cards

Why Square Errors?

Squaring prevents positive and negative errors from cancelling and penalizes large misses more heavily than small ones.

  • Also introduces the useful property of small penalties for very close predictions and large penalties for large misses (outliers).

Example:

  • Missing by 1: 12 = 1

  • Missing by 5: 52 = 25

7
New cards

Mean Minimizes SSE

For a single-parameter model, the arithmetic mean of Y is the value that yields the smallest possible SSE.

  • It’s the best estimate until we find a useful variable.

8
New cards

Degrees of Freedom (df)

The number of independent pieces of information after accounting for estimated parameters; n – parameters.

  • Each data point can be explained by its own unique story.

  • The model tells a simple story of the data.

  • Mean is an estimated parameter form our data, and uses a degree of freedom.

9
New cards

Baseline Model (Model C)

A model with only an intercept (mean of Y); used as a comparison standard when adding predictors.

10
New cards

Augmented Model (Model A)

A model that includes additional explanatory variable(s) beyond the intercept.

11
New cards

Parameter Count (P)

The number of estimated coefficients in a model; e.g., PA for Model A, PC for Model C.

12
New cards

Variance (s² or MSE)

Average squared error: SSE / (n – 1); reflects typical squared miss size.

<p>Average squared error: SSE / (n – 1); reflects typical squared miss size.</p>
13
New cards

Standard Deviation (s)

Square root of variance; expresses a typical miss or data spread in original units.

  • Tells us about the spread around the mean and sizes of a typical miss.

<p>Square root of variance; expresses a typical miss or data spread in original units.</p><ul><li><p>Tells us about the spread around the mean and sizes of a typical miss.</p></li></ul><p></p>
14
New cards

Proportion Reduction in Error (PRE)

(SSEC – SSEA) / SSEC ; proportion of error eliminated by moving from Model C to Model A.

15
New cards

Sum of Squares Regression (SSR)

Total variation explained by the model: Σ(ŶiC – Ȳ)²; equals SSEC – SSEA for single-predictor comparison.

16
New cards

F-Statistic

Test ratio: [PRE / (PA–PC)] / [(1–PRE) / (n–PA)]; evaluates whether added parameter(s) significantly decrease SSE.

  • A value that helps us decide if we reject the null hypothesis in regression analysis.

    • If calculated f is greater than the critical value, we reject the null hypothesis.

17
New cards

t-Statistic (Coefficient)

Coefficient divided by its standard error; for one predictor t² = F for overall model.

18
New cards

p-Value

Probability of observing a test statistic as extreme as the one obtained if the null hypothesis is true; small p suggests significance.

19
New cards

R-squared

Proportion of variance in Y explained by the model relative to the mean-only model (Model C).

20
New cards

Adjusted R-squared

R-squared adjusted for the number of predictors; penalizes unnecessary parameters.

21
New cards

Residual Standard Error

Square root of SSE / (n – PA); average size of prediction errors in original units.

22
New cards

Intercept (b0)

Estimated value of Y when all predictors equal zero; the model’s baseline level.

23
New cards

Slope / Coefficient (b1)

Estimated change in Y for a one-unit change in predictor X, holding other variables constant.

24
New cards

Categorical Predictor Coding

Representing categories with numeric indicators (e.g., 0 = no laundry, 1 = in-unit laundry) so regression can estimate effects.

25
New cards

Standard Error of Coefficient

Typical distance between the estimated coefficient and its true value; smaller SE implies more precise estimate.

26
New cards

Confidence Interval

Range computed from estimate ± (critical value × SE) that likely contains the true parameter with a chosen confidence level.

27
New cards

Simple Regression

Regression with one predictor; fits a line (or two means for binary X) minimizing SSE.

28
New cards

Model Comparison

Evaluating whether adding predictors (Model A) improves fit over a simpler model (Model C) by testing SSE reduction.

29
New cards

Wordle Example

Hypothesis test comparing mean steps for common vs. uncommon words; illustrates difference-in-means via SSE framework.

30
New cards

Lottery Expected Value

Average payoff (probability × prize); used as comparison value in willingness-to-pay modeling example.

31
New cards

Error for Model C

ei =Yi - Ȳi.

Ȳi = Mean

32
New cards

Measures of Location (Measures of Central Tendency)

Mean, median, mode

33
New cards

Measures of Variability (Measures of Spread)

Standard deviation (can be squared to find variance), median absolute deviation

34
New cards

Model

A _______ is a way to explain variance in our DATA.

A _______ makes predictions for every observation.

35
New cards

Variation

_______ we have not accounted for remains in the error term.

This means, some portion of error is unexplained _______.

36
New cards

What is an example of a rate statistic?

Batting average in baseball is an example of a rate statistic, measuring the success of a player in hitting the ball.

37
New cards

What is an example of count statistics?

The number of hits, runs, or errors in a game is an example of count statistics, representing discrete outcomes.

38
New cards

What is Yi?

An observed data point in row i that we want to model.

39
New cards

What is Xij?

An observed data value for row i that we use to model Yi (the value of Yi changes as Xij does)

40
New cards

What is B1?

The “true” value for how Yi changes as with variable(s) Xij (that we estimate from our data).

41
New cards

What is bj?

Partial regression coeffecient (partial slope) representing the weight we should give to Xij in estimating Yi (how much the model estimates Yi changes with a 1 unit change in Xij)

42
New cards

Regression Model

A ________ __________ minimizes SSE by fitting parameter estimates.

43
New cards

What is Model A’s Outcome?

b0 + b1X1 + …bj-1Xj-1 + bjXj + b + ei

44
New cards

What is Model B’s Outcome?

b0 + b1X1 + …bj-1Xj-1 + b + ei