BITI2223_Association Rules Mining

Mining Association Rules

  • Frequent Pattern Analysis

  • Association Rules Mining

  • Evaluation Methods

Frequent Pattern Analysis Concepts

Categories in Frequent Pattern Analysis

  • Classification

  • Prediction

  • Association Rules Mining

  • Clustering

  • Deviance Detection

Algorithms Used

  • Apriori Algorithm

  • FP-Growth

  • Sequential Rules

Importance of Frequent Pattern Analysis

  • Foundation for Data Mining

    • Enables analysis of association, correlation, and causality.

    • Applies to sequential, structural (e.g., sub-graph) patterns.

  • Applications

    • Basket data analysis.

    • Cross-marketing strategies and catalog design.

    • Analysis of sale campaigns, web logs, and DNA sequences.

Frequent Pattern Association Rule Basics

  • Definition: A frequent pattern is a pattern that appears frequently in transaction datasets.

  • Patterns in Association Rule Mining

    • Basic patterns include closed/max pattern generators.

    • Variations: multilevel, multidimensional, continuous data patterns, etc.

Pattern Mining Techniques

Mining Patterns

  • Basic Mining Techniques

    • Pattern growth (e.g., FP-Growth, HMine).

    • Candidate generation (e.g., Apriori, EClat, CHARM).

  • Interestingness Measures

    • Subjective vs. objective measures.

    • Evaluation of correlation rules, distributed and parallel mining.

Extended Patterns

  • Types of Extended Patterns

  • Spatial Patterns: Co-location patterns.

  • Temporal Patterns: Evolutionary and periodic patterns.

  • Network Patterns: Applications include semantic data compression and collaborative filtering.

Association Rule Details

Rule Components

  • Itemset Definition: An itemset X={x1,...,xk} is considered in a set of database transactions.

  • Transaction Identifier: Each transaction T is identified by TID, containing a set of items.

Rule Formation:

  • Support

    • Probability that a transaction contains itemset X ∪ Y.

  • Confidence

    • Conditional probability that if X occurs, Y also occurs.

Support and Confidence Example

  • Minimum Support: 50%

  • Results:

    • A → C (Support: 50%, Confidence: 66.7%)

    • C → A (Support: 50%, Confidence: 100%)

Rule Evaluation Criteria

Based on Value Types

  • Boolean Association Rule: Presence/absence tracking.

  • Quantitative Association Rule: Partitioning of quantitative values into intervals.

Based on Dimensions

  • Single vs. Multi-dimensional Rules:

    • Single: Age(X, "30..39") ^ Income(X, "42K…48K") =>buys(X, "LCD TV")

    • Multi: behaves similarly but across multiple dimensions.

Redundancy in Rules

  • Some rules can be redundant due to ancestor relationships e.g., milk → wheat bread.

Frequent Itemsets and Pruning

  • Apriori Principle: Any subset of a frequent itemset must also be frequent.

  • Efficiency: Generate longer candidate itemsets from length k frequent itemsets, using pruning to improve performance.

FP-Growth Algorithm Overview

Key Points

  • Fewer Database Scans

  • No Candidate Generation: Avoids the bottleneck associated with candidate generation and test.

  • Construction of FP-tree: Database representation of frequent patterns into FP-tree structure for efficient mining.

  • Compression of Database: Helps retain association information while reducing the volume, ensuring no loss of frequent pattern data.

Conclusion

  • Applications and Extensions: Frequent pattern analysis has broad applications in market basket analysis, classification, clustering, and beyond, effectively employing various algorithms and techniques to derive meaningful patterns from vast datasets.