1/26
Vocabulary flashcards covering key terms and definitions from the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Mining
Non-trivial extraction of implicit, previously unknown and potentially useful information from data; exploration and analysis of large quantities of data to discover meaningful patterns.
Knowledge Discovery in Data (KDD)
Process of identifying valid, novel, and potentially useful patterns in data; DM is often considered a part of KDD.
Predictive
A primary goal of data mining that involves building models to forecast future events or estimate unknown values.
Descriptive
A primary goal of data mining focused on discovering human-interpretable patterns and insights that describe the characteristics and relationships within existing data. Aims to explain 'what happened' or 'what is happening' by summarizing data properties and finding meaningful structures.
Classification
A predictive task that assigns data instances to predefined classes.
Clustering
A descriptive task that groups similar data objects into clusters.
Association Rule Discovery
A descriptive task that uncovers interesting relationships among items in large databases.
Deviation Detection
A predictive task focused on identifying unusual or anomalous data points.
Regression
A predictive task that estimates continuous numerical values.
Origins of Data Mining
Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems.
Analytics
Broad practice of analyzing data to draw conclusions; often used interchangeably with data science in business contexts.
Data Science
Interdisciplinary field combining statistics, ML, and domain knowledge to extract insights from data.
Large-scale data
Massive data volumes that require scalable mining techniques; typical in modern data mining.
High Dimensionality
Large number of features; makes mining harder and can cause the 'curse of dimensionality'.
Heterogeneous Data
Data from diverse sources and formats that are not uniform.
Data Quality
Issues such as noise, missing values, and errors that affect mining results.
Data Ownership and Distribution
Questions about who owns data and how it is stored/distributed across organizations.
Privacy Preservation
Techniques and policies to protect individuals' privacy in data mining (e.g., anonymization).
Streaming Data
Data that arrives continuously in real time, requiring online or incremental mining methods.
Recommender Engine
System that suggests items users might find interesting based on past behavior and ratings.
Targeted Marketing
Marketing that focuses on specific customer segments using mining insights.
Credit Card Fraud Detection
Application of data mining to identify and prevent fraudulent transactions.
Data Types in DM
Categories and forms of data encountered in data mining, which significantly influence the choice of mining techniques. Key distinctions include: Structured vs. Unstructured Data & Numerical vs. Categorical Data.
Structured Data
Highly organized data that fits into a fixed schema, such as relational databases (e.g., customer tables with distinct columns for name, address, ID). It is easily searchable and analyzable by algorithms.
Unstructured Data
Data that does not have a predefined model or organization, making it more challenging to process and analyze directly (e.g., text documents, images, audio, video files, social media posts).
Numerical Data
Quantitative data representing measures or counts (e.g., age, income, temperature, sales figures). It can be discrete (countable integers) or continuous (any value within a range).
Categorical Data
Qualitative data that represents labels or groups, often descriptive (e.g., gender, product type, city, true/false values). It can be nominal (categories without inherent order, like 'color') or ordinal (categories with a meaningful order, like 'education level' - high school, bachelor's, master's).