Data Mining

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

Why Data Mining?

plays a critical role in today’s data-drive world, enabling organizations and individuals to make informed decisions, optimize operations, and uncover hidden opportunities.

Compare association rule mining and clustering.

2
New cards

Why Data Mining? (Scientific Viewpoint)

> Data collected and stored at enormous speeds

  • remote sensors on a satellite

  • - NASA EOSDIS archives over petabytes of earth science data/year

  • telescopes scanning the skies

  • - Sky survey data

  • High-throughput biological data

  • scientific simulations

  • - terabytes of data generated in a few hours

> Data mining helps scientists

  • in automated analysis of massive datasets

  • In hypothesis formation

3
New cards

What is Data Mining?

is the process of analyzing large sets of data to discover patterns, trends, relationships, or useful insights that might not be immediately obvious. It transforms raw data into meaning information that can help in decision-making.

4
New cards

What Kinds of Data Can Be Mined?

Knowledge discovery from data, KDD

Database Data

  • also called a database management systems (DBMS)

5
New cards

Relational database

is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows)

6
New cards

entity-related (ER) data model

is often used constructed for relational databases. an ER data model represents the database as a set of entities and their relationships.

7
New cards

Data warehouse

> is a repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site.

> are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.

<p>&gt; is a repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site. <br></p><p> &gt; are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. </p><p></p>
8
New cards

Data cube

provides a multidimensional view of data and allows the pre-computation and fast access of summarized data.

<p>provides a multidimensional view of data and allows the pre-computation and fast access of summarized data.</p>
9
New cards

OLAP operations

include drill-down and roll-up, which allow the user to view the data at differing degrees of summarization.

<p>include drill-down and roll-up, which allow the <strong>user to view </strong>the data at differing degrees of summarization.</p>
10
New cards

Transactional data

refers to information that is generated during business transactions or events. It captures the specific details of an interaction, exchange, or activity between two parties, such as a purchase, order, or payment.

<p>refers to information that is generated during business transactions or events. It captures the specific details of an interaction, exchange, or activity between two parties, such as a purchase, order, or payment.</p>
11
New cards

What Kinds of Patterns Can be Mined?

Data Mining Functionalities:

1. Characterization and discrimination

  1. the mining of frequent patterns, associations, and correlations

  2. classification and regression

  3. clustering analysis; and

  4. outlier analysis

<p>Data Mining Functionalities:<br><br>1. Characterization and discrimination</p><ol start="2"><li><p>the mining of frequent patterns, associations, and correlations </p></li><li><p>classification and regression</p></li><li><p>clustering analysis; and </p></li><li><p>outlier analysis </p></li></ol><p></p>
12
New cards

Data Characterization

13
New cards
14
New cards