Descriptive Data Mining Association Rules

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:24 AM on 5/1/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

20 Terms

1
New cards

What are Association Rules?

Study of “what goes with what”

– “Customers who bought X also bought Y”

– What symptoms go with what diagnosis

• Transaction-based or event-based

• Also called “market basket analysis” and “affinity analysis”

• Originated with study of customer transactions databases to determine associations among items purchased

2
New cards

Association Rule Discovery: Application 1

Marketing and Sales Promotion:

–Let the rule discovered be {Bagels, … } --> {Potato Chips}

–Potato Chips as consequent boost its sales.

–Bagels in the antecedent => Can be used to determine what should be done to => Can be used to see which products would be affected if the store discontinues selling bagels.

–Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips!

3
New cards

Association Rule Discovery: Application 2

Supermarket shelf management.

–Goal: To identify items that are bought together by sufficiently many customers.

–Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items

.–A classic rule

-• If a customer buys diaper and milk, then he is very likely to buy beer.

• So, don’t be surprised if you find six-packs stacked next to diapers!

4
New cards

Association rules:

If-then statements which convey the likelihood of certain items being purchased together.

• Although association rules are an important tool in market basket analysis, they are also applicable to other disciplines.

5
New cards

Antecedent (A)

The collection of items (or item set) corresponding to the if portion of the rule.

6
New cards

Consequent (C):

The item set corresponding to the then portion of the rule.

7
New cards

Item set

and “Frequent Item Set”

8
New cards

Support count of an item set

Number of transactions in the data that include that item set.

9
New cards

IF” part

antecedent

10
New cards

“THEN” part

consequent

11
New cards

“Item set” again

the items (e.g., products) comprising the antecedent or consequent

12
New cards

Antecedent and consequent are…

disjoint (i.e., have no items in common)

13
New cards

Although the number of possible association rules can be overwhelming, we typically

investigate only association rules that involve antecedent and consequent item sets that occur together frequently.

Frequent Item Sets

• What is “frequent?”  Find “Support”

Support of an item set transactions as the percentage (or number) of in the data that include that item set (both the antecedent and the consequent) = No of Item sets {antecedent and consequent} / Total No of transactions

14
New cards

Some key points

The potential impact of an association rule is often governed by the number of transactions it may affect, which is measured by computing the support of the item set consisting of the union of its antecedent and consequent.

• Investigating the rule “if {bread, jelly}, then {peanut butter}” from Table 5.4, we see the support of {bread, jelly, peanut butter} is 2 (20%).

• For a transaction randomly selected from the data set displayed in Table 5.4, the probability of it containing the item set {bread, jelly, peanut butter} is 0.2.

• By only considering rules involving item sets with a support above a minimum level, inexplicable rules capturing random noise in the data can generally be avoided.

• A rule of thumb is to consider only association rules with a support of at least 20% of the total number of transactions.

• If an item set is particularly valuable and represents a lucrative opportunity, then the minimum support used to filter the rules can be lowered.

15
New cards

Confidence

A property of a reliable association rule is that, given a transaction contains the antecedent item set, there is a high probability that it contains the consequent item set. This conditional probability of P(consequent item set | antecedent item set) is called the confidence of a rule

Helps identify reliable association rules

16
New cards

Caution: About Confidence

Although high value of confidence suggests a rule in which the consequent is frequently true when the antecedent is true, a high value of confidence can be misleading.

– For example, if the support of the consequent is high

—that is, the item set corresponding to the then part is very frequent

—then the confidence of the association rule could be high even if there is little or no association between the items.

– In Table 5.4, the rule “if {cheese}, then {fruit}” has a confidence of 1.0 (or 100%). This is misleading because {fruit} is a frequent item; almost any rule with {fruit} as the consequent will have high confidence.

– Therefore, to evaluate the efficiency of a rule, we need to compare the P(consequent | antecedent) to the P(consequent) to determine how much more likely the consequent item set is given the antecedent item set versus just the overall (unconditional) likelihood that a transaction contains the consequent.  Lift Ratio

17
New cards

Lift Ratio:

The ratio of the P(consequent | antecedent) to P(consequent) is called the lift ratio of the rule

Measure to evaluate the efficiency of a rule:

18
New cards

About Lift Ratio

The lift ratio represents how effective an association rule is at identifying transactions in which the consequent item set occurs versus a randomly selected transaction.

• A lift ratio greater than one suggests that there is some usefulness to the rule and that it is better at identifying cases when the consequent occurs than having no rule at all.

• From the definition of lift ratio, we see that the denominator contains the probability of a transaction containing the consequent set multiplied by the probability of a transaction containing the antecedent set.

• This product of probabilities is equivalent to the expected likelihood of a transaction containing both the consequent item set and antecedent item set if these item sets were independent.

• In other words, a lift ratio greater than one suggests that the level of association between the antecedent and consequent is higher than would be expected if these item sets were independent.

19
New cards

Support

Notion of “Frequent”:

20
New cards

Evaluating Association Rules

An association rule is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

– For example, Walmart mined its transactional data to uncover strong evidence of the association rule, “If a customer purchases a Barbie doll, then a customer also purchases a candy bar.”

– Walmart could leverage this relationship in product placement decisions as well as in advertisements and promotions, perhaps by placing a high-margin candy-bar display near the Barbie dolls

• Association rules often result in “obvious” relationships such as “IF Cereal THEN Milk. – while it may be true, but no new insight.

• An association rule is useful if it is well supported and explains an important previously unknown relationship.

• Adjusting the data by aggregating items into more general categories (or splitting items into more specific categories) so that items occur in roughly the same number of transactions often yield better association rules.

Note: the support of an association rule can generally be improved by basing it on less specific antecedent and consequent item sets; but this also tends to yield less insights)