Send a link to your students to track their progress
29 Terms
1
New cards
Why data mining?
\-More intense competition at the global scale
\-Recognition of the value in data sources
\-Availability of quality data on customers, vendors, transactions, Web, etc.
\-Consolidation and integration of data repositories into data warehouses
\-The exponential increase in data processing and storage capabilities; and decrease in cost
\-Movement toward conversion of information resources into nonphysical form
2
New cards
What is data mining?
The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases.
3
New cards
What are the four types of patterns?
association, prediction, cluster (segmentation), and sequential (or time series) relationships
4
New cards
What are the three most common data mining processes?
1) CRISP-DM
2) SEMMA
3) KDD (Knowledge Discovery in Databases)
5
New cards
What are the six steps in the CRISP-DM data mining process?
1) Business understanding
2) Data understanding
3) Data preparation
4) Model building
5) Testing and evaluation
6) Deployment
6
New cards
What are the 5 steps in the SEMMA data mining process?
Sample, Explore, Modify, Model, Assess
7
New cards
What are the steps involved in KDD?
1) Data selection
2) Data cleaning
3) Data transformation
4) Data mining
5) Internalization
8
New cards
Provide examples of commercial data mining software tools
IBM SPSS Modeler, SAS Enterprise Miner, Statistica
9
New cards
Provide examples of free and/or open source software tools
KNIME, RapidMiner, R
10
New cards
What are the major characteristics and objectives of data mining?
\-Data is presented in a variety of formats
\-Data environment is usually client/server architecture or web-based IS architecture
\-the miner is often the end-user
\-DM tools combined with spreadsheets & other software development tools
\-parallel-processing used
11
New cards
What are associations?
commonly co-occurring groupings of things
12
New cards
What are predictions?
tell the nature of future occurrences of certain events based on what's happened in the past; experience and opinion-based; associated with forecasting
13
New cards
What are clusters?
Identify natural groupings of things based on known characteristics such as assigning customers in different segments based on their demographics and past purchase behaviors
14
New cards
What are sequential relationships?
discover time-ordered events
15
New cards
What are the three main categories of Data Mining?
prediction, association, and segmentation (clustering)
16
New cards
What is classification?
The objective is to analyze the historical data stored in a database & automatically generate a model that can predict future behaviour
17
New cards
What are the three main types of prediction
classification, regression, time-series
18
New cards
What are decision trees?
One data mining methodology is decision trees, which generate rules and classify data sets; a hierarchy of if/then statements
19
New cards
What is clustering?
the tendency to remember similar or related items in groups
20
New cards
What are two commonly used derivatives in association mining?
1) Link Analysis -> the linkage among many objects of interest is discovered automatically
2) Sequence mining -> relationships are examined in terms of their order of occurrence to identify associations over time
21
New cards
What is the difference between statistics and data mining?
Statistics - collects sample data to test the hypothesis
DM & Analytics - use all existing data to discover new patterns & relationships
22
New cards
What is the difference between CRISP-DM and SEMMA?
CRISP-DM: takes a more comprehensive approach; including understanding of the business & relevant data to DM projects
\ SEMMA: implicitly assumes that the DM project's goals and objectives and data sources have been identified and understood
23
New cards
Describe Knowledge Discovery in Databases (KDD)
process of using DM methods to find useful info and patterns in the data; in relation to data mining: DM involves using algorithms to identify patterns in data derived from the KDD process
24
New cards
What's the difference between classification and clustering?
Classification learns the function between the characteristics of things and their membership through a supervised learning process -both variable types presented to the algorithm
\ Clustering learns through an unsupervised learning process where only the input variables are presented to the algorithm
25
New cards
What are the factors considered in model assessment?