1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Why Data Mining?
plays a critical role in today’s data-drive world, enabling organizations and individuals to make informed decisions, optimize operations, and uncover hidden opportunities.
Compare association rule mining and clustering.
Why Data Mining? (Scientific Viewpoint)
> Data collected and stored at enormous speeds
remote sensors on a satellite
- NASA EOSDIS archives over petabytes of earth science data/year
telescopes scanning the skies
- Sky survey data
High-throughput biological data
scientific simulations
- terabytes of data generated in a few hours
> Data mining helps scientists
in automated analysis of massive datasets
In hypothesis formation
What is Data Mining?
is the process of analyzing large sets of data to discover patterns, trends, relationships, or useful insights that might not be immediately obvious. It transforms raw data into meaning information that can help in decision-making.
What Kinds of Data Can Be Mined?
Knowledge discovery from data, KDD
Database Data
also called a database management systems (DBMS)
Relational database
is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows)
entity-related (ER) data model
is often used constructed for relational databases. an ER data model represents the database as a set of entities and their relationships.
Data warehouse
> is a repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site.
> are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.
Data cube
provides a multidimensional view of data and allows the pre-computation and fast access of summarized data.
OLAP operations
include drill-down and roll-up, which allow the user to view the data at differing degrees of summarization.
Transactional data
refers to information that is generated during business transactions or events. It captures the specific details of an interaction, exchange, or activity between two parties, such as a purchase, order, or payment.
What Kinds of Patterns Can be Mined?
Data Mining Functionalities:
1. Characterization and discrimination
the mining of frequent patterns, associations, and correlations
classification and regression
clustering analysis; and
outlier analysis
Data Characterization