Data Science 3

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 19

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

20 Terms

1

Predictive Model

A mathematical algorithm that predicts a target variable from explanatory variables.

New cards
2

Classification

The process of using predictor information to sort data samples into distinct classes.

New cards
3

Numeric Prediction

Predicting the numerical value of a dependent variable using independent variables.

New cards
4

Rule Induction

A process of deducing if-then rules from a data set.

New cards
5

Decision Tree

A model that splits data on every node leading to a leaf where the class is identified.

New cards
6

K-Nearest Neighbors (k-NN)

A method that classifies an unknown record based on its nearest neighbors in the training data.

New cards
7

Eager Learners

Models that develop a mathematical relationship between input and target variables.

New cards
8

Lazy Learners

Models that use a lookup table to match input variables with outcomes.

New cards
9

Overfitting

The tendency to tailor models too closely to training data, impacting generalization.

New cards
10

Pruning

The process of reducing the size of a decision tree to prevent overfitting.

New cards
11

Cross-Validation

A training and testing procedure that provides estimates of model generalization performance.

New cards
12

Confusion Matrix

A matrix that counts correct and false classifications to assess predictive capability.

New cards
13

Gains & Lift Charts

Charts that measure the effectiveness of a classification model against a baseline.

New cards
14

Test Set Validation

Holding out data to assess how well a model generalizes to unseen cases.

New cards
15

Generalization Performance

The ability of a model to apply to unseen data that was not part of the training set.

New cards
16

Domain Knowledge Validation

Sanity checking a model by getting assessments from domain experts.

New cards
17

Majority Vote

A method in k-NN where the class label is determined by the most common label among the nearest neighbors.

New cards
18

Distance Measure

A method for computing the distance between different records in k-NN.

New cards
19

Model Complexity

The intricacy of a model, which can lead to overfitting if not managed appropriately.

New cards
20

Attribute (in classification context)

Features or characteristics used to sort data samples into classes.

New cards
robot