1/17
Vocabulary flashcards for data mining concepts.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Mining
Methods that attempt to discover patterns, trends, and relationships among data, especially non-obvious and unexpected patterns.
Data Lake
Unstructured data in its original format.
Data Warehouse
A database structured to study patterns in data from multiple sources.
Data Mart
A scaled down data warehouse that is specific to one part of an organization.
Classification
Predicting the class (or category) to which a record belongs.
Prediction
Predicting the value of a continuous variable.
Cluster Analysis
Separating data into groups such that records are similar within a group and are different across groups.
Market Basket Analysis
Identifying items that are often purchased together.
Overfitting
Creating a model that fits the test data TOO well, such that it will not match new data well; model re-creates noise and patterns specific to the test data that are not generalizable to new data.
Training Data
Data used to build the model (typically 70-80% of the original data).
Testing Data
Data used to evaluate the model (typically 20-30% of the original data).
Confusion Matrix
Organizes the counts of records by predicted class and actual class.
Overall Accuracy
The number of true predictions divided by the total number of records.
Sensitivity
How well a classifier correctly detects the important class members.
Specificity
How well a classifier correctly rules out the less important class members.
Classification Trees
Separate records into subgroups by creating splits on predictor variables, forming logical if/then rules.
Decision (splitting) node
Splits data into subgroups. Have successors (nodes below them).
Terminal node
Contain total count and count of each class