CAP 4770 - Lecture 0: Vocabulary Cards)

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key terms and definitions from the lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Data Mining

Non-trivial extraction of implicit, previously unknown and potentially useful information from data; exploration and analysis of large quantities of data to discover meaningful patterns.

2
New cards

Knowledge Discovery in Data (KDD)

Process of identifying valid, novel, and potentially useful patterns in data; DM is often considered a part of KDD.

3
New cards

Predictive

A primary goal of data mining that involves building models to forecast future events or estimate unknown values.

4
New cards

Descriptive

A primary goal of data mining focused on discovering human-interpretable patterns and insights that describe the characteristics and relationships within existing data. Aims to explain 'what happened' or 'what is happening' by summarizing data properties and finding meaningful structures.

5
New cards

Classification

A predictive task that assigns data instances to predefined classes.

6
New cards

Clustering

A descriptive task that groups similar data objects into clusters.

7
New cards

Association Rule Discovery

A descriptive task that uncovers interesting relationships among items in large databases.

8
New cards

Deviation Detection

A predictive task focused on identifying unusual or anomalous data points.

9
New cards

Regression

A predictive task that estimates continuous numerical values.

10
New cards

Origins of Data Mining

Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems.

11
New cards

Analytics

Broad practice of analyzing data to draw conclusions; often used interchangeably with data science in business contexts.

12
New cards

Data Science

Interdisciplinary field combining statistics, ML, and domain knowledge to extract insights from data.

13
New cards

Large-scale data

Massive data volumes that require scalable mining techniques; typical in modern data mining.

14
New cards

High Dimensionality

Large number of features; makes mining harder and can cause the 'curse of dimensionality'.

15
New cards

Heterogeneous Data

Data from diverse sources and formats that are not uniform.

16
New cards

Data Quality

Issues such as noise, missing values, and errors that affect mining results.

17
New cards

Data Ownership and Distribution

Questions about who owns data and how it is stored/distributed across organizations.

18
New cards

Privacy Preservation

Techniques and policies to protect individuals' privacy in data mining (e.g., anonymization).

19
New cards

Streaming Data

Data that arrives continuously in real time, requiring online or incremental mining methods.

20
New cards

Recommender Engine

System that suggests items users might find interesting based on past behavior and ratings.

21
New cards

Targeted Marketing

Marketing that focuses on specific customer segments using mining insights.

22
New cards

Credit Card Fraud Detection

Application of data mining to identify and prevent fraudulent transactions.

23
New cards

Data Types in DM

Categories and forms of data encountered in data mining, which significantly influence the choice of mining techniques. Key distinctions include: Structured vs. Unstructured Data & Numerical vs. Categorical Data.

24
New cards

Structured Data

Highly organized data that fits into a fixed schema, such as relational databases (e.g., customer tables with distinct columns for name, address, ID). It is easily searchable and analyzable by algorithms.

25
New cards

Unstructured Data

Data that does not have a predefined model or organization, making it more challenging to process and analyze directly (e.g., text documents, images, audio, video files, social media posts).

26
New cards

Numerical Data

Quantitative data representing measures or counts (e.g., age, income, temperature, sales figures). It can be discrete (countable integers) or continuous (any value within a range).

27
New cards

Categorical Data

Qualitative data that represents labels or groups, often descriptive (e.g., gender, product type, city, true/false values). It can be nominal (categories without inherent order, like 'color') or ordinal (categories with a meaningful order, like 'education level' - high school, bachelor's, master's).