1/10
Python Final
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Linear regression
Statistical model used to predict a continuous numeric value (e.g., revenue, price, score) by finding the best-fitting straight line through the data. Calculates the weighted sum of input features, where each feature has a coefficient that reflects its impact on the outcome.
Strengths
Simple and interpretable, coefficients show feature importance, fast to train and evaluate
Limitations
Assumes linearity, sensitive to outliers/multicollinearity, requires numeric inputs
Assumptions
Linear relationship between inputs and outputs, independent variables aren’t highly correlated, features must be numeric
R-squared
Proportion of variance in the dependent variable explained by the model. Closer to 1 means better fit
MAE (Mean Absolute Error)
Average absolute difference between actual and predicted values.
MSE (Mean Squared Error)
Average of the squared differences. Penalizes larger errors more heavily.
Hyperparameters
None directly, but the choice and scaling of features affect performance.
Common Business Applications
Predicting sales revenue based on advertising spend. Forecasting housing prices using location, size, and features. Estimating customer lifetime value (CLV)
Should You Scale the Data?
Yes, especially if comparing coefficients. However, you may choose not to scale the data if your primary goal is to interpret coefficients in their original units (e.g., "every additional $1,000 spent on marketing increases revenue by $500").
Can You Use Categorical Variables?
Yes, but you must convert them to numeric. One-hot encoding is preferred for linear regression because it avoids assigning arbitrary numeric values that can distort relationships.