CPSC 428: Applied Machine Learning

0.0(0)
studied byStudied by 54 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/129

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

130 Terms

1
New cards

Supervised Learning

  • Uses known features to predict an unknown output

    feature (requires training)

  • Requires labeled data

    • Training: Learn from examples -> generalize model

    • Bad data -> bad models

  • 2 Types: Regression and Classification

2
New cards

Feature transformation

involves manipulating the features or variables in a dataset to improve the performance of a machine learning model or to better understand the relationships between variables. Feature transformation is often combined with other techniques such as data collection, model training, and model evaluation to create effective and accurate models for a wide range of applications.

3
New cards

Single-value imputation

is a univariate imputation technique that involves replacing missing values with a constant such as the mean, median, or mode. Although easy to implement and computationally inexpensive, single-value imputation has the following limitations and potential problems:

  • Distortion of data — The feature's distribution will be weighted heavily toward the single-value estimate after imputation.

  • Underestimation of variance and standard errors — Statistical methods that are sensitive to variance and standard errors such as hypothesis testing and confidence intervals will lead to inaccurate results.

4
New cards

Univariate imputation

algorithms replace values for a feature using only non-missing values for that same feature.

5
New cards

Multivariate imputation

use regression models to predict the missing values based on the other features in the data. Multivariate imputation can handle both linear and nonlinear relationships between the features but assumes that the data is normally distributed and the missing values do not affect the regression model. The output feature(s) should not be used for multivariate imputation to avoid bias during model training.

6
New cards

k-nearest neighbors imputation

uses the k most similar instances to a data point to impute the missing values. This technique can handle both numerical and categorical data. knn imputation preserves the distribution of data and is more robust to outliers compared to single-value imputation techniques that use the mean or median. Since knn is a distance-based algorithm, normalization or standardization is required, especially when the features have different units.

7
New cards

Creating binary features in scikit

# Load nbaallelo_log.csv into a dataframe
NBA = pd.read_csv('nbaallelo_log.csv')

# Create binary feature for game_result with 0 for L and 1 for W
NBA['win'] = NBA['game_result'].apply(lambda x: 1 if x == 'W' else 0)

8
New cards

Steps for Linear Regression

  1. read in data

    diamonds = pd.read_csv('diamonds.csv')
  2. define features

    X = diamonds[['carat', 'table']]
    y = diamonds['price']
  3. Initalize Model

    multRegModel = LinearRegression()
  4. Fit Model

    multRegModel.fit(X,y)
  5. Get intercept

    intercept = multRegModel.intercept_
  6. Get coefficients

    coefficients = multRegModel.coef_
  7. Predict

    prediction = multRegModel.predict([[5, 2]])

9
New cards

Steps for Elastic Net Regression

  1. read in data

    diamonds = pd.read_csv('diamonds.csv')
  2. define features

    # Define input and output features
    X = diamonds[['carat', 'table']]
    y = diamonds[['price']]
  3. Scale the features

    # Scale the input features
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
  4. Initialize Model

    
    # Initialize a model using elastic net regression with a regularization strength of 6, and l1_ratio=0.4
    eNet = ElasticNet(alpha=6, l1_ratio=0.4)
  5. Fit Model

    # Fit the elastic net model to the input and output features
    eNet.fit(X,y)
  6. Get intercept

    # Get estimated intercept weight
    intercept = eNet.intercept_
    print('Intercept is', np.round(intercept, 3))
  7. Get coefficients

    # Get estimated weights for carat and table features
    coefficients = eNet.coef_
    print('Weights for carat and table features are', np.round(coefficients, 3))
    
  8. Predict

    prediction = eNet.predict([[carat, table]])
    print('Predicted price is', np.round(prediction, 2))

10
New cards

Steps for KNN Regression

  1. read in data

    diamonds = pd.read_csv('diamonds.csv')
  2. define features

    # Define input and output features
    X = diamonds[['carat', 'table']]
    y = diamonds['price']
  3. Initialize Model

    # Initialize a k-nearest neighbors regression model using a Euclidean distance and k=12 
    knnr = KNeighborsRegressor(n_neighbors=12, metric="euclidean")
  4. Fit Model

    # Fit the kNN regression model to the input and output features
    knnrFit = knnr.fit(X,y)
  5. Predict

    # Create array with new carat and table values
    Xnew = [[carat, table]]
    
    # Predict the price of a diamond with the user-input carat and table values
    prediction = knnrFit.predict([[carat, table]])
    print('Predicted price is', np.round(prediction, 2))
  6. get nearest neighbors distance

    # Find the distances and indices of the 12 nearest neighbors for the new instance
    neighbors = knnrFit.kneighbors(Xnew)
    print('Distances and indices of the 12 nearest neighbors are', neighbors)

11
New cards

Steps for KNN Classifier

  1. read in data

    # Load the dataset
    skySurvey = pd.read_csv('SDSS.csv')
  2. define features

    # Create a new feature from u - g
    skySurvey['u_g'] = skySurvey['u'] - skySurvey['g']
    
    # Create dataframe X with features redshift and u_g
    X = skySurvey[['u_g','redshift']]
    
    # Create dataframe y with feature class
    y = skySurvey[['class']]
  3. Split data is applicable

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.
  4. Initialize Model

    # Initialize model with k=3
    skySurveyKnn = KNeighborsClassifier(n_neighbors=3)
  5. Fit Model

    # Fit model using X_train and y_train
    skySurveyKnn.fit(X_train, np.ravel(y_train))
  6. Predict

    # Find the predicted classes for X_test
    y_pred = skySurveyKnn.predict(X_test)
  7. get score

    # Calculate accuracy score
    score = skySurveyKnn.score(X_test,np.ravel(y_test))

12
New cards

Classification Metrics in scikit-learn

  1. read in data, define features, initalize model, fit model, and predict

    # Input the random state
    rand = int(input())
    
    # Load sample set by a user-defined random state into a dataframe. 
    NBA = pd.read_csv("nbaallelo_log.csv").sample(n=500, random_state=rand)
    
    # Create binary feature for game_result with 0 for L and 1 for W
    NBA['win'] = NBA['game_result'].replace(to_replace = ['L','W'], value = [int(0), int(1)])
    
    # Store relevant columns as variables
    X = NBA[['elo_i']]
    y = NBA[['win']]
    
    # Build logistic model with default parameters, fit to X and y
    lr = LogisticRegression()
    lr.fit(X, np.ravel(y))
    
    # Use the model to predict the classification of instances in X
    logPredY = lr.predict(X)
  2. initialize confusion matrix

    # Calculate the confusion matrix for the model
    confMatrix = metrics.confusion_matrix(y, logPredY)
    print("Confusion matrix:\n", confMatrix)
  3. get metrics

    # Calculate the accuracy for the model
    accuracy = metrics.accuracy_score(y, logPredY)
    print("Accuracy:", round(accuracy,3))
    
    # Calculate the precision for the model
    precision = metrics.precision_score(y, logPredY)
    print("Precision:", round(precision,3))
    
    # Calculate the recall for the model
    recall = metrics.recall_score(y, logPredY)
    print("Recall:", round(recall, 3))
    
    # Calculate kappa for the model
    kappa = metrics.cohen_kappa_score(y, logPredY)
    print("Kappa:", round(kappa, 3))

13
New cards

Regression Metrics in Scikit-Learn

  1. read in data, define features, initalize model, fit model, and predict

    # Load sample set by a user-defined random state into a dataframe
    diamonds = pd.read_csv('diamonds.csv').sample(n=500, random_state=rand)
    
    # Define input and output features
    X = diamonds[['carat', 'table']]
    y = diamonds['price']
    
    # Initialize and fit a multiple linear regression model
    multRegModel = LinearRegression()
    multRegModel.fit(X,y)
    
    # Use the model to predict the classification of instances in X
    mlrPredY = multRegModel.predict(X)
  2. get metrics

    # Calculate mean absolute error for the model
    mae = metrics.mean_absolute_error(y, mlrPredY)
    print("MAE:", round(mae, 3))
    
    # Calculate mean squared error for the model
    mse = metrics.mean_squared_error(y, mlrPredY)
    print("MSE:", round(mse, 3))
    
    # Calculate root mean squared error for the model
    rmse = metrics.mean_squared_error(y, mlrPredY, squared=False)
    print("RMSE:", round(rmse, 3))
    
    # Calculate R-squared for the model
    r2 = metrics.r2_score(y, mlrPredY)
    print("R-squared:", round(r2, 3))

14
New cards

Hyperparameter

user-defined setting in a machine learning model that is not estimated during model fitting. Changing the values of a hyperparameter affects the model's performance and predictions.

an example would be k in knn

15
New cards

Unsupervised Learning

  • No labeled data, good for data exploration &

    finding hidden patterns, automatic processing

    (no training), no prediction

  • 3 Types: Clustering, Association, and Dimensionality Reduction

16
New cards

Euclidean Distance Formula

d = √[(x2 – x1)2 + (y2 – y1)2]

17
New cards

Manhattan Distance Formula

The Manhattan distance formula, also known as the L1 distance or taxicab distance, calculates the distance between two points in a grid-like space by summing the absolute differences of their coordinates: |x1 - x2| + |y1 - y2|

18
New cards

Minkowski Distance Formula

The Minkowski distance formula, a generalized way to measure distance between two points in a vector space, is (∑ |uᵢ - vᵢ|ᵖ)¹/ᵖ, where 'p' is a parameter that determines the type of distance (e.g., p=1 for Manhattan, p=2 for Euclidean). 

19
New cards

Dimensionality Reduction

• Reduce the number of features (=>faster processing)

• Sometimes part of pre-processing data (e.g., for

supervised learning)

20
New cards

Reinforcement Learning

  • Feedback based algorithms where future

    decisions are made on previous outcomes.

  • “Good” decisions get rewards, “bad” ones don’t.

  • Algorithm evolves over time.

  • Examples:

    • Self-driving cars/robotics

    • Stock Market

    • Playing games

21
New cards

Features

  • input into the model

  • X

22
New cards

Labels

  • correct outputs

  • y

23
New cards

Clustering

unsupervised learning task in which instances are grouped based on similarities in the input features. Since clustering is unsupervised, no target output features exist. Instead, clustering results in a new feature containing group assignments. Clustering algorithms use similarity measures to group instances, such as distance or correlation. Applications of clustering include customer segmentation, recommendation systems, and social network analysis.

24
New cards

Classification

  • Based on labeled training data identifying

    membership of individual items in distinct groups,

    the algorithm will predict membership, i.e.,

    classify, unseen data into one of the groups.

  • Requires a training and testing phase

  • Common classification algorithms are

    • KNN

    • Logistic Regression

    • Decision Trees & Forests

25
New cards

Steps of Machine Leaning

26
New cards

Testing Set

used to fit the initial model.

27
New cards

Training Set

used to evaluate a model's performance or select between competing models.

28
New cards

Validation Set

used to decide optimal hyperparameter values or assess whether a model is overfitted or underfitted.

29
New cards

Underfitting

the model is too simple to fit the data well. Underfitted models do not explain changes in the output feature sufficiently, and will score poorly during model evaluation.

30
New cards

Overfitting

the model is too complicated and fits the data too closely. Overfitted models do not generalize well to new data. A model that fits the general trend in the data without too much complexity is preferred.

31
New cards

Varience

  • model's sensitivity to fluctuations in the training data, leading to overfitting when high, where the model captures noise and performs poorly on unseen data

  • Variance is a measure of how much a model's predictions vary when trained on different subsets of the training data.

  • Low variance → less sensitive to change

  • high variance → sensitive to change; fits training data too closely.

32
New cards

Bias

  • Adding bias adds more errors to your training set, but

    result in better result with unseen data.

  • systematic errors or unfair outcomes introduced by algorithms or training data, leading to disproportionate predictions for specific groups or individuals

  • basing your predictions of the data, so think about steryotyping

  • low bias → few assumptions; model matches training data too closely

  • high bias → more assumptions; model doesn’t match training data too closely

33
New cards

Mean Absolute Error

  • mean of the absolute value of the residuals

  • easy to understand

  • won’t punish large errors

34
New cards

Mean Squared Error

  • mean of squared residuals

  • large errors are punished more than MAE

  • units are reported in y²

35
New cards

Root Mean Squared Error

  • root mean squared residuals

  • same as MSE, but units are just y

36
New cards

sum of squares due to regression (SSR)

the sum of the differences between the predicted value and the mean of the dependent variable. In other words, it describes how well our line fits the data.

37
New cards

Residual Sum of Squares (RSS/SSE)

measures the level of variance in the error term, or residuals, of a regression model

38
New cards

Least Squares

selects weights w0 and w1 such that the sum of the squared residuals is minimized. Mathematically, least squares selects weights such that R⁢S⁢S=∑i=1n(yi−yi^)2=∑i=1n(yi−(w0+w1⁢xi))2 is minimized.

39
New cards

represents goodness of fit for the regression model

40
New cards

Loss Function

quantifies the difference between a model's predictions and the observed values.

41
New cards

Accuracy

  • What percentage of all observations was correctly

    categorized?

  • Higher is better :)

  • Doesn’t work with imbalanced problems

  • measures the overall correctness of a model's predictions, representing the ratio of correct predictions to the total number of instances

  • Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives). 

  • If a model predicts the sentiment of 100 tweets and 85 of those predictions are correct, the accuracy is 85/100 = 85%. 

42
New cards

Recall

  • Recall (Sensitivity or True Positive Rate) - for all

    actual yes, how often does it predict yes?

  • TP / (TP + FN)

<ul><li><p>Recall (Sensitivity or True Positive Rate) - for all</p><p>actual yes, how often does it predict yes?</p></li><li><p>TP / (TP + FN)</p></li></ul><p></p>
43
New cards

Precision

  • when it predicts yes, how often is it

    correct?

  • for all predicted yes, how often is it

    correct?

  • TP / (TP + FP)

<ul><li><p>when it predicts yes, how often is it</p><p>correct?</p></li><li><p>for all predicted yes, how often is it</p><p>correct?</p></li><li><p>TP / (TP + FP)</p></li></ul><p></p>
44
New cards

Regularization

  • Want to find the optimal model without over- or

    underfitting.

  • Regularization adds a “penalty term” to the loss

    function that shrinks the coefficients toward zero,

    simplifying the complexity of the model and

    identifying less important predictors.

  • 3 Types

    • L1 (Lasso)

    • L2 (Ridge)

    • Elastic Net (combo)

45
New cards

LASSO (L1)

  • Lasso - Least absolute shrinkage and selection

    operator

  • Lasso is good for eliminating useless features

    since it can change the weights/coefs to zero.

  • Helps reduce dimensionality

  • Not good with small datasets

46
New cards

Ridge (L2)

  • Penalizes large weights, but doesn’t eliminate

    them, so all features contribute

  • Works with well with smaller sets since it keeps all features

47
New cards

Elastic Net (L3)

  • Combination of the L1 and L2, a hyper parameter

    determine how much each regularization factor

    contributes. Note, we have two hyperparameters,

    (α and λ).

48
New cards

KNN Classification

  • Supervised Learning

  • Generally used for classification, but can also do

    regression

  • Example of instance-based learning, doesn’t train

    or build a generalized internal model, but stores

    instances of the training data.

  • Requires a lot of RAM

  • Classify new item based on majority of the K

    neighbors.

  • Hyperparameter K determines the number of

    neighbors, optimal choice of K is very data

    dependent

  • Large K take a lot of computational power

  • Distance-based algorithms, like KNN, must use

    properly scaled data.

  • Use Standardization for scaling.

49
New cards

Elbow Method

graphs the total inertia of the clusters against values of \(k\) and chooses the \(k\) for which the curve levels off. Since increasing \(k\) also increases model complexity, the elbow method finds the \(k\) with the best tradeoff between complexity and inertia.

<p><span style="font-size: medium">graphs the total inertia of the clusters against values of \(k\) and chooses the \(k\) for which the curve levels off. Since increasing \(k\) also increases model complexity, the elbow method finds the \(k\) with the best tradeoff between complexity and inertia.</span></p>
50
New cards

Grid Search

  • Grid search is a method for finding the best hyperparameters for a machine learning model by systematically evaluating all possible combinations of hyperparameters within a predefined grid.

  • How to:

    1. Choose model

      1. model = ElasticNet()

    2. Choose parameters

      1. param_grid = {‘alpha’:[0.1,2,5,10,50,100], ‘l1_ratio’:[.1,.5,.7,.95,.99,1]}

    3. Import needed libraries

      1. from sklearn.model_selection import GridSearchCV

    4. initialize GridSearch

      1. gridModel = GridSearchCV(estimater=model, param_grid=param_grid, scoring=‘neg_mean_squared_error’, cv=5, verbose=2)

    5. fit model

      1. gridModel.fit(X_train, y_train)

51
New cards

Standardization

  • Center data with mean = 0 and standard deviation = 1

  • Data is normally distributed

  • Some outliers

  • Linear/Logistic regression preform better with this.

52
New cards

Normalization

  • Scale data to a fixed range, usually [0-1]

  • Data is not normally distributed

  • No extreme outliers

  • Distance-based models (KNN, K-Means work better with

    normalized data

53
New cards

F1

  • This is a weighted average of recall and precision.

54
New cards

Cross Validation

uses different subsets of the data for model training and model testing. Reserving a subset of the data for testing purposes allows fitted models to be evaluated without risk of bias. Cross-validation can be used to fine-tune a model's hyperparameters or choose between competing models.

55
New cards

Stratified cross-validation sets

evenly split, or balanced, for all levels of the output feature. In classification, stratification ensures that each cross-validation subset represents the class proportions in the total dataset. In regression, stratified samples are generated so that the descriptive statistics for each subset are about equal. Stratification is important when a given class or value is rare or when the fitted model depends on the class proportions.

56
New cards

k-fold cross-validation

splits a training set into \(k\) non-overlapping subsets, called folds. Each of the \(k\) subsets is used as validation data in one cross-validation run, with the remaining \(k-1\) subsets used for model training. Since cross-validation is performed multiple times, k-fold cross-validation can be used to measure the variability of parameters and performance measures. k-fold cross-validation may be used to select possible hyperparameter values or measure how sensitive a model's performance is to the training/validation split. Using more folds is computationally expensive, so 5-10 folds are recommended.

57
New cards

Leave-one-out cross-validation

(LOOCV) holds out one instance at a time for validation, with the remaining \(n-1\) instances used to train the model. Leave-one-out cross-validation is useful for identifying individual instances with a strong influence on a model. Leave-one-out cross-validation can be thought of as k-fold cross-validation with \(k = n\).

58
New cards

Confusion Matrix

is a table that summarizes the combinations of predicted and actual values. For binary classifiers, a confusion matrix is a table with two rows and two columns and gives the number of true positives, true negatives, false positives, and false negatives.

A true positive (TP) is an outcome that is correctly predicted as positive.

A true negative (TN) is an outcome that is correctly predicted as negative.

A false positive (FP) is an outcome that is predicted as positive but is actually negative.

A false negative (FN) is an outcome that is predicted as negative but is actually positive.

<p>is a table that summarizes the combinations of predicted and actual values. For binary classifiers, a confusion matrix is a table with two rows and two columns and gives the number of true positives, true negatives, false positives, and false negatives.</p><p></p><p>A true positive (TP) is an outcome that is correctly predicted as positive.</p><p>A true negative (TN) is an outcome that is correctly predicted as negative.</p><p>A false positive (FP) is an outcome that is predicted as positive but is actually negative.</p><p>A false negative (FN) is an outcome that is predicted as negative but is actually positive.</p><p></p>
59
New cards

Imputation

group of techniques used in machine learning to replace missing values in a dataset with a reasonable estimate.

60
New cards

Feature Engineering

the process of selecting, manipulating, and transforming raw data into features that can be used in machine learning models to improve their accuracy and performance

61
New cards

Linear Regression

  • Supervised Learning

  • Use features and labels (X) to be able to predict a future outcome (y) based on unseen features

  • Think “line of best fit”, assumes a linear relationship between features and labels (outcomes). EDA will confirm this early.

  • Y = mx + b (m = slope, b = intercept)

  • Outcome is a quantity/continuous value

  • PRO: runs fast, no/little tuning required, highly

    interpretable and well understood. It lets us

    understand the relationship between outcome

    and importance of given features.

    • CON: Main drawback, unlikely to produce the best

    predictive accuracy compared to other modes

    since it assumes an underlying linear relationship

    between the features and the response value.

62
New cards

Univariate Regression

  • we are trying to predict a single value

63
New cards

Multivariate Regression

  • we are trying to predict a multiple value

64
New cards

Linear Regression (Simple)

models or predicts the output feature based on a linear relationship with only one input feature, y^=w0+w1⁢x. The weights w0 and w1 are the estimated y-intercept and slope.

65
New cards

Residual

the vertical distance between the

observed data value and the predicted value for the

instance by the linear model

66
New cards

Linear Regression (Polynomial)

xtends the simple linear model to include all p input features for predicting the output feature and takes the form y^=w0+w1⁢x1+w2⁢x2+...+wp⁢xp, where wj is the weight corresponding to the jt⁢h input feature for j=1,2,...,p. The weights wj represent the average effect on the output feature for a one-unit increase in xj, holding all other input feature values fixed.

67
New cards

KNN Regression

predicts the value of a numeric output feature based on the average output for other instances with the most similar, or nearest, input features. The k nearest instances, or neighbors, are identified using a distance measure with the input features. The average value of the output feature for the k nearest instances becomes the prediction. The k-nearest neighbors regression prediction is a numeric value compared to k-nearest neighbors for classification that predicts a class.

68
New cards

Logistic Regression

  • Despite the name it’s not regression, but

    classification

  • This classification yields membership probabilities

  • Binary classification is the default

  • Uses the sigmoid function to fit the data rather

    than a straight line

  • This yields values 0 to 1 indicating membership

    probability of for a given class

69
New cards

Sigmoid Function

<p></p>
70
New cards

Ordinal Data

  • implied order - can use integer encoding

71
New cards

Integer Encoding

  • the process of converting categorical data (like text labels) into numerical values, typically integers, for easier processing by algorithms

  • Mexico = 1, USA = 2, Canda = 3

72
New cards

one-hot encoding

  • each category in a categorical feature is converted into a binary vector that has a length equal to the number of unique categories in the feature.

  • male = 100, female = 010, child = 001

73
New cards

Dummy Encoding

  • each category is assigned a unique binary vector with a length that is one less than the number of categories.

  • male = 10, female = 01, child = 00

74
New cards

Nominal Data

  • no inherent order - OHE or Dummy

75
New cards

Artificial Intelligence

Computers and programs to mimic human problem solving and decision making capabilities

76
New cards

Machine Learning

  • Self-learning algorithms to derive knowledge from data in order to predict outcomes or organize data

  • Semi-automated extraction of knowledge from data. I.e., it learning from examples and experience

  • Sophisticated pattern matching

77
New cards

Regression

  • Used to predict a quantity/continuous value

    • E.g., predict cost of a house based on location, # of rooms,

    square footage etc.

78
New cards

Binary Encoding in scikit

#List unique values for a feature
brfss['GeneralHealth'].unique()

# Create duplicate dataset to avoid overwriting original encodings
brfss_encoded = brfss.copy()

# Apply binary encoding to HadHeartAttack
labelsHeartAttack = LabelBinarizer()

brfss_encoded[['HadHeartAttack']] = labelsHeartAttack.fit_transform(brfss_encoded[['HadHeartAttack']])

79
New cards

Label Encoding in scikit

# Apply label encoding to GeneralHealth
#choose default OR speciifc 

# Default ordering
labelsGeneralHealth = OrdinalEncoder()

# Specific ordering
orderedCategories = [['Poor', 'Fair', 'Good', 'Very good', 'Excellent']]
labelsGeneralHealth = OrdinalEncoder(categories=orderedCategories)

brfss_encoded[['GeneralHealth']] = labelsGeneralHealth.fit_transform(brfss_encoded[['GeneralHealth']])

80
New cards

One-Hot Encoding in scikit

# List unique values for a feature
brfss['RaceEthnicityCategory'].unique()

# Create duplicate dataset to avoid overwriting original encodings
brfss_encoded = brfss.copy()

# Apply one-hot encoding to RaceEthnicityCategory
labelsRaceEthnicity = OneHotEncoder(sparse_output=False).set_output(transform='pandas')
RE_df = labelsRaceEthnicity.fit_transform(brfss_encoded[['RaceEthnicityCategory']])
RE_df

# Add one-hot encoded features to the dataframe and drop original
brfss_encoded = pd.concat([brfss_encoded, RE_df], axis=1).drop(['RaceEthnicityCategory'], axis=1)

81
New cards

Model Evaluation

process of using metrics to assess how well a supervised machine learning model's predictions match observed values.

82
New cards

Classification Metric

  • quantifies the predictive performance of a classifier by comparing the model's predictions to the observed classes.

  • used to evaluate and compare fitted classification models.

  • Common classification metrics include accuracy, precision, recall, confusion matrices, and kappa.

83
New cards

Accuracy

a classifier is the proportion of correct predictions

84
New cards

Precision

knowt flashcard image

a classifier is the proportion of correct positive predictions

85
New cards

Recall

a classifier is the proportion of correctly predicted positive instances

86
New cards

F1-score

knowt flashcard image

The harmonic mean of precision and recall. The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.Ex: For two numbers A and B, the harmonic mean is: ((A−1+B−1)/2)-1

87
New cards

Fb-score

knowt flashcard image

A weighted harmonic mean of precision and recall where β adjusts the tradeoff of importance between precision and recall. The precision has more importance when β<1, Fβ=F1 when β=1, and the recall has more importance when β>1.

88
New cards

Kappa

knowt flashcard image

The metric kappa (κ) compares the observed accuracy of a classifier, Accuracyo⁢b⁢s, with the expected accuracy, Accuracye⁢x⁢p, of a random chance classifier.

89
New cards

Confusion Matrix in Scikit-learn

confusion_matrix(y_true, y_pred)

90
New cards

Accuracy in Scikit-learn

accuracy_score(y_true, y_pred)

91
New cards

Precision in Scikit-learn

precision_score(y_true, y_pred)

92
New cards

Recall in Scikit-learn

recall_score(y_true, y_pred)

93
New cards

Kappa in Scikit-learn

cohen_kappa_score(y1, y2)

94
New cards

Model Selection

the process of identifying the best model from a set of fitted models. Model selection may be based on performance metrics, model interpretability, or model assumptions. Good models have:

  • Strong performance. Ex: High accuracy, low mean squared error.

  • Consistent performance. Ex: Models perform similarly during cross-validation or across multiple training/validation/testing splits.

  • Reasonable assumptions. Ex: No model assumptions are seriously violated.

Models with low bias and variance models with high bias or high variance.

95
New cards
96
New cards
97
New cards
98
New cards
99
New cards
100
New cards