Data Science 3

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

Predictive Model

A mathematical algorithm that predicts a target variable from explanatory variables.

2
New cards

Classification

The process of using predictor information to sort data samples into distinct classes.

3
New cards

Numeric Prediction

Predicting the numerical value of a dependent variable using independent variables.

4
New cards

Rule Induction

A process of deducing if-then rules from a data set.

5
New cards

Decision Tree

A model that splits data on every node leading to a leaf where the class is identified.

6
New cards

K-Nearest Neighbors (k-NN)

A method that classifies an unknown record based on its nearest neighbors in the training data.

7
New cards

Eager Learners

Models that develop a mathematical relationship between input and target variables.

8
New cards

Lazy Learners

Models that use a lookup table to match input variables with outcomes.

9
New cards

Overfitting

The tendency to tailor models too closely to training data, impacting generalization.

10
New cards

Pruning

The process of reducing the size of a decision tree to prevent overfitting.

11
New cards

Cross-Validation

A training and testing procedure that provides estimates of model generalization performance.

12
New cards

Confusion Matrix

A matrix that counts correct and false classifications to assess predictive capability.

13
New cards

Gains & Lift Charts

Charts that measure the effectiveness of a classification model against a baseline.

14
New cards

Test Set Validation

Holding out data to assess how well a model generalizes to unseen cases.

15
New cards

Generalization Performance

The ability of a model to apply to unseen data that was not part of the training set.

16
New cards

Domain Knowledge Validation

Sanity checking a model by getting assessments from domain experts.

17
New cards

Majority Vote

A method in k-NN where the class label is determined by the most common label among the nearest neighbors.

18
New cards

Distance Measure

A method for computing the distance between different records in k-NN.

19
New cards

Model Complexity

The intricacy of a model, which can lead to overfitting if not managed appropriately.

20
New cards

Attribute (in classification context)

Features or characteristics used to sort data samples into classes.