Association Analysis and Apriori (ML Exam 3)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/34

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

35 Terms

1
New cards

Association Analysis

Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transactions

2
New cards

transaction width

number of items in a transaction

3
New cards

itemset

any collection of 0 or more items

4
New cards

k itemset

an itemset with k items

5
New cards

support count

number of transactions that contain the itemset

6
New cards

frequent itemset

an itemset whose support count is >= some minsup threshold

7
New cards

Association rule

an implication expression of the form X→Y, where X and Y are disjoint itemsets. (X is the antecedent and Y is the consequent). Correlation, not causation. Directionality matters

8
New cards

support

fraction of transactions that contain both X and Y

9
New cards

confidence

measures how often items in y appear in transactions that contain X

10
New cards

strong association rule

high s and high c (support >= minsup and confidence >= minconf threshold)

11
New cards

goal of association rule mining

to find all strong rules

12
New cards

Association Rule Mining steps

  1. Frequent itemset generation: find all itemsets that satisfy minsup

  2. strong rule generation: find all rules in the frequent itemsets that satisfy minconf

13
New cards

how many candidate itemsets are there given d items?

2d itemsets

14
New cards

Apriori principle

if an itemset is frequent, then all of its subsets must also be frequent. if an itemset is infrequent, then all of its supersets must be infrequent

15
New cards

Candidate Generation Fk-1 x F1

  • items in each frequent itemset must be sorted

  • Extend each frequent k-1 itemset with a frequent 1 itemset that comes after all of the items in the k-1 itemset

16
New cards

Candidate Generation Fk-1 x Fk-1 (Apriori gen)

  • items in each frequent itemset must be sorted

  • Merge pairs of k-1 itemsets if all but the last item is the same

17
New cards

Method for Apriori Pruning

if any k-1 subset of the candidate is not a frequent k-1 itemset, the candidate is pruned

18
New cards

Frequent itemsets

the final list of all frequent itemsets found at every k

19
New cards

Tyranny of Counting Pairs

the most memory is required for counting frequent pairs

20
New cards

rule generation brute force

all non-empty subsets of a frequent itemset are enumerated and tested against minconf

21
New cards

confidence based pruning

confidence is anti-monotone

as we move items to the right side of the rule, the confidence can only go down, it cannot go up (only rules generated from the same itemset)

look at denominator of confidence equation for explanation

22
New cards

Rule generation algorithm

  • start with rules that have only one item in the consequent and test against minconf

  • the high confidence rules that are found are then used to generate the next round of candidate rules by merging consequents

  • repeat on all frequent itemsets

  • rules that meet minconf are strong rules

23
New cards

how to set minsup

large set of items means small minsup (<1%)

24
New cards

maximal frequent itemset

no frequent supersets

25
New cards

closed frequent itemset

no superset with the same support count

26
New cards

maximal and closed frequent itemset algorithm

  • keep a list of maximal/closed frequent itemsets

  • each time a frequent itemset is generated, perform the subset check and superset check

27
New cards

subset check for maximal algorithm

is the frequent itemset just found a subset of anything in the maximal list? If so, it is not maximal, end. else, add it to the maximal list and do superset check

28
New cards

superset check for maximal algorithm

Is the frequent itemset just found a superset of anything in the maximal list? If so, remove items in the maximal list that are a subset of this itemset

29
New cards

subset check for closed algorithm

is the frequent itemset just found a subset of anything in the closed list? If so, is it’s support higher than its superset in the list? if no, it is not closed. If yes, add it to the closed list and do the superset check.

30
New cards

superset check for closed algorithm

Is the frequent itemset just found a superset of anything in the closed list? If so, does the subset in the list have the same or higher support? If subsets support is the same, remove the subset from the closed list. If subsets support is higher, it remains in the list.

31
New cards

cross support patterns

  • rules that relate low frequency items to high frequency items

  • support ratio is below a user specified threshold

32
New cards

limitations of support and confidence

there are times when both support and confidence are high, but the rule produced is not good

33
New cards

Lift

measures the ratio of the observed frequency of co-occurence to the expected frequency

symmetric metric (directionality doesnt matter)

34
New cards

lift value meanings

1: no correlation

<1: negative correlation

>1: positive correlation

35
New cards

contingency tables

X, not X, Y, not Y, fill in support counts