1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
positive correlation (correlation matrix)
indicates that the two variables move in the same direction.
When one variable increases, the other variable also tends to increase, and similarly, when one decreases, the other tends to decrease.
a coefficient of +1, means that the two variables move in the same direction in a perfectly linear manner.
negative correlation (correlation matrix)
indicates that the two variables move in the opposite directions.
When one variable increases, the other variable tends to decrease, and viceversa.
a coefficient of -1, means that the two variables move in opposite directions in a perfectly linear manner.
causation
correlation does not imply _____
linear
the closer the correlation coefficient is to +1 or -1, the stronger the _____ relationship between the variables. (correlation matrix)
no
correlation of 0 indicates ___ linear relationship. (correlation matrix)
benefits of train-test split
Simplicity and Speed: It’s straightforward to understand, implement and computationally less intensive than other methods.
Direct Evaluation: Provides a clear, direct way to assess how the model performs on unseen data.
cons of train-test split
Potential Data Wastage: Splitting the dataset reduces the amount of data available for training the model, which might be a concern for smaller datasets.
Risk of Bias: If the split is not representative, it can introduce bias in the evaluation, making the model appear to perform better or worse than it actually does. Less robust than other methods.
multicollinearity
indicates that we have redundant information from some highly correlated predictors, making it difficult to distinguish their individual effects on the dependent variable.
a solution to check for multicollinearity.
using correlation matrix to identify and then remove highly correlated predictors or reduce the number of predictors by performing Principal Components Analysis (PCA).
root mean squared error (RMSE)
calculates the square root of the average squared differences between the predicted and actual values.
It represents the standard deviation of the residuals (prediction errors).
RMSE is sensitive to outliers and punishes larger errors, high RMSE suggests presence of large errors in predictions
! A lower RMSE value indicates better model performance, with 0 being the ideal score
mean absolute error (MAE)
quantifies the average magnitude of the errors between the predicted values and the actual values, focusing solely on the size of errors without considering their direction.
It reflects the average distance between predicted and actual values across all predictions.
provides a straightforward and easily interpretable measure of model prediction accuracy.
it’s robust to outliers, making it a reliable metric when dealing with real-world data that may contain anomalies.
! values range from 0 to infinity, with lower values indicating better model performance. A MAE of 0 means the model perfectly predicts the target variable
R-Squared (R²)
quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables.
offers an insight into the goodness of fit of the model. However, it does not indicate if the model is the appropriate one for your data, nor does it reflect on the accuracy of the predictions.
! values range from 0 to 1, where higher values indicate better model fit. An R² of 1 suggests the model perfectly predicts the target variable.
training split
the data the model gets to learn from
largest chunk
two methods to preserve data for assessing model performance
data split
cross-validation