Linear Regression

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/10

flashcard set

Earn XP

Description and Tags

Python Final

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

11 Terms

1
New cards

Linear regression

Statistical model used to predict a continuous numeric value (e.g., revenue, price, score) by finding the best-fitting straight line through the data. Calculates the weighted sum of input features, where each feature has a coefficient that reflects its impact on the outcome.

2
New cards

Strengths

Simple and interpretable, coefficients show feature importance, fast to train and evaluate

3
New cards

Limitations

Assumes linearity, sensitive to outliers/multicollinearity, requires numeric inputs

4
New cards

Assumptions

Linear relationship between inputs and outputs, independent variables aren’t highly correlated, features must be numeric

5
New cards

R-squared

Proportion of variance in the dependent variable explained by the model. Closer to 1 means better fit

6
New cards

MAE (Mean Absolute Error)

Average absolute difference between actual and predicted values.

7
New cards

MSE (Mean Squared Error)

Average of the squared differences. Penalizes larger errors more heavily.

8
New cards

Hyperparameters

None directly, but the choice and scaling of features affect performance.

9
New cards

Common Business Applications

Predicting sales revenue based on advertising spend. Forecasting housing prices using location, size, and features. Estimating customer lifetime value (CLV)

10
New cards

Should You Scale the Data?

Yes, especially if comparing coefficients. However, you may choose not to scale the data if your primary goal is to interpret coefficients in their original units (e.g., "every additional $1,000 spent on marketing increases revenue by $500").

11
New cards

Can You Use Categorical Variables?

Yes, but you must convert them to numeric. One-hot encoding is preferred for linear regression because it avoids assigning arbitrary numeric values that can distort relationships.