W10 - correlation

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

positive correlation (correlation matrix)

  • indicates that the two variables move in the same direction.

  • When one variable increases, the other variable also tends to increase, and similarly, when one decreases, the other tends to decrease.

  • a coefficient of +1, means that the two variables move in the same direction in a perfectly linear manner.

2
New cards

negative correlation (correlation matrix)

  • indicates that the two variables move in the opposite directions.

  • When one variable increases, the other variable tends to decrease, and viceversa.

  • a coefficient of -1, means that the two variables move in opposite directions in a perfectly linear manner.

3
New cards

causation

correlation does not imply _____

4
New cards

linear

the closer the correlation coefficient is to +1 or -1, the stronger the _____ relationship between the variables. (correlation matrix)

5
New cards

no

correlation of 0 indicates ___ linear relationship. (correlation matrix)

6
New cards

benefits of train-test split

  • Simplicity and Speed: It’s straightforward to understand, implement and computationally less intensive than other methods.

  • Direct Evaluation: Provides a clear, direct way to assess how the model performs on unseen data.

7
New cards

cons of train-test split

  • Potential Data Wastage: Splitting the dataset reduces the amount of data available for training the model, which might be a concern for smaller datasets.

  • Risk of Bias: If the split is not representative, it can introduce bias in the evaluation, making the model appear to perform better or worse than it actually does. Less robust than other methods.

8
New cards

multicollinearity

indicates that we have redundant information from some highly correlated predictors, making it difficult to distinguish their individual effects on the dependent variable.

9
New cards

a solution to check for multicollinearity.

using correlation matrix to identify and then remove highly correlated predictors or reduce the number of predictors by performing Principal Components Analysis (PCA).

10
New cards

root mean squared error (RMSE)

calculates the square root of the average squared differences between the predicted and actual values.

  • It represents the standard deviation of the residuals (prediction errors).

  • RMSE is sensitive to outliers and punishes larger errors, high RMSE suggests presence of large errors in predictions

! A lower RMSE value indicates better model performance, with 0 being the ideal score

11
New cards

mean absolute error (MAE)

quantifies the average magnitude of the errors between the predicted values and the actual values, focusing solely on the size of errors without considering their direction.

  • It reflects the average distance between predicted and actual values across all predictions.

  • provides a straightforward and easily interpretable measure of model prediction accuracy.

  • it’s robust to outliers, making it a reliable metric when dealing with real-world data that may contain anomalies.

! values range from 0 to infinity, with lower values indicating better model performance. A MAE of 0 means the model perfectly predicts the target variable

12
New cards

R-Squared (R²)

quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables.

  • offers an insight into the goodness of fit of the model. However, it does not indicate if the model is the appropriate one for your data, nor does it reflect on the accuracy of the predictions.

! values range from 0 to 1, where higher values indicate better model fit. An R² of 1 suggests the model perfectly predicts the target variable.

13
New cards

training split

the data the model gets to learn from

  • largest chunk

14
New cards

two methods to preserve data for assessing model performance

  • data split

  • cross-validation