Association Rule

0.0(0)
studied byStudied by 4 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/49

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

50 Terms

1
New cards
Apriori Algorithm
Attempts to find subsets that are common to at least a minimum number c (cutoff, or confidence threshold) of the itemsets.
2
New cards
What does apriori algorithm discover?
Association rules
3
New cards
Who discovered the apriori algorithm?
Agrawal and Srikant in 1994
4
New cards
What are the Steps of Apriori?
Generate 1-itemset frequent pattern based on a defined minimum support value. this set is denoted L1, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and Repeat until no more frequent k-itemsets can be found
5
New cards
What is this graphic an example of?
The Apriori Algorithm
6
New cards
Market basket analysis
Is a common analysis running against a transaction database to find sets of items, or item sets, that appear together in many transactions.
7
New cards
Examples of application of association rule mining.
Improve the placement of items in a store, promote the items as a package, the layout of web pages, product recommendations
8
New cards
What are the pros of apriori?
Easy-to-implement, easy-to-understand and can be used on large itemsets.
9
New cards
What are the cons of apriori?
May need to find a large number of candidate rules that can be computationally expensive and calculating support is also expensive because it has to go through the entire database.
10
New cards
Sequential Patter Analysis
Similar to association rule mining, except that the relationship exists over a period of time -> where one event leads to another later event(time-series data analysis).
11
New cards
Examples of sequential pattern analysis.
After inflate rate increases, the stock market is likely to go down within a week and after a student takes statistics course, what course is most likely to be next in the following semester?
12
New cards
Examples of Business Management Issues.
"We have mountains of data in this company, but we can't access it " , "We may need to slice and dice the data in every way" , "You've got to make it easy for business people to get at the data directly"
13
New cards
Data Warehouse
A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision-making process.
14
New cards
A ________ is a repository of consolidated multiple heterogeneous data sources.
data warehouse
15
New cards
Data Warehouses can be _____ and maintained ______ from _____ databased. Organized under a unified scheme at a single site in order to facilitate decision making.
different, separately, operational
16
New cards
What does the data warehouse facilitate?
Decision making(analysis of data for decision makers)
17
New cards
What does data warehouse not facilitate?
Daily operations or transaction processing.
18
New cards
Benefits of Data Warehouse.
Fast access data/info on a single site
19
New cards
Data Warehouses are usually archive in different databases at _______ locations (distributed).
different
20
New cards
Short-term Goals of Data Warehouse.
Improve data quality, minimize inconsistent reports, provide data sharing, integrate data from multiple sources, merge historical and current data appropriately, and improving the speed and performance of reporting.
21
New cards
Long-term goals of Data Warehouse.
Provide a consolidated view of enterprise data and develop an enterprise approach to business intelligence and decision support.
22
New cards
What are characteristics of data warehouse?
Subject oriented, integrated, time-variant, nonvolatile.
23
New cards
What does it mean for a data warehouse to be subject oriented?
Data are organized based on major subject areas of the corporation.
24
New cards
What does it mean for a data warehouse to be integrated?
data in a DW are collected from distributed, heterogeneous data sources.
25
New cards
What does it mean for a data warehouse to be time-variant?
Data has a time dimension - each data point is associated with a point in time.
26
New cards
What does it mean for a data warehouse to be nonvolatile?
New data is always appended rather than replaced. The database continually absorbs new data, integrating it with the previous data.
27
New cards
What is the name of Data Staging Area
Extraction, Transformation, Loading (ETL)
28
New cards
Extraction
reading and understanding the source data and copying the data needed for the data warehouse into the staging area for further manipulation.
29
New cards
Transformation
cleansing, combining data from multiple sources, deduplicating data, and assigning warehouse keys
30
New cards
Loading
loading the data into the data warehouse presentation area
31
New cards
What are data access tools?
tools that query the data in the data warehouse’s presentation area.
32
New cards
What is the purpose of data access tools?
the variety of capabilities that can be provided to business users to leverage the presentation area for analytic decision making
33
New cards
Examples of Data Access Tools.
prebuilt parameter-driven analytic applications, ad hoc query tools, data mining, modeling, forecasting
34
New cards
Data Warehouse are very costly and limit their used to large companies.(T/F)
True
35
New cards
________: data about spanning the whole organization.
Data Warehouse
36
New cards
Datamarts
A lower-cost, scaled down version of a data warehouse and specialized for a single department.
37
New cards
How to create a datamart?
Replicated functional subsets of the dataware.S
38
New cards
Standalone data marts
a company may have several independent data marts without a DW.
39
New cards
What type of data model is used for a data warehouse?
A multidimensional structure
40
New cards
How is the data model for a data warehouse structured?
Each dimension corresponds to an attribute or a set of attributes in the schema.
41
New cards
What are the characteristics of a Star Schema?
Uses fact and dimension tables, each dimension is represented by a dimension-table, transactions are described through a fact table, easy for users to understand and optimized for OLAP.
42
New cards
What is the OLAP(Online Analytical Processing) used for?
To enable the user to gain insight into data through interactive access to a wide variety of possible views of the information
43
New cards
How can you view data in OLAP?
From different perspectives and different levels of abstractions(Grain).
44
New cards
What 2 analyses are used for OLAP?
Drill-down and Roll-up
45
New cards
Drill-down
Takes you all the way down to the transaction detail level.
46
New cards
Roll-up
Is the reverse of drill-down.
47
New cards
What does drilling down in OLAP help you discover?
Trends
48
New cards
What 2 view perspectives are used to alter dimensions in OLAP?
Slicing and Dicing
49
New cards
Slicing
taking out the slice of a cube, given certain set of select dimension (customer segment), and value (home furnishings..) and measures (sales revenue, sales units..) or KPIs (Sales Productivity).
50
New cards
Dicing
viewing the slices from different angles and allowing decision support system users to change their view perspective.