BITI2223_Association Rules Mining
Mining Association Rules
Frequent Pattern Analysis
Association Rules Mining
Evaluation Methods
Frequent Pattern Analysis Concepts
Categories in Frequent Pattern Analysis
Classification
Prediction
Association Rules Mining
Clustering
Deviance Detection
Algorithms Used
Apriori Algorithm
FP-Growth
Sequential Rules
Importance of Frequent Pattern Analysis
Foundation for Data Mining
Enables analysis of association, correlation, and causality.
Applies to sequential, structural (e.g., sub-graph) patterns.
Applications
Basket data analysis.
Cross-marketing strategies and catalog design.
Analysis of sale campaigns, web logs, and DNA sequences.
Frequent Pattern Association Rule Basics
Definition: A frequent pattern is a pattern that appears frequently in transaction datasets.
Patterns in Association Rule Mining
Basic patterns include closed/max pattern generators.
Variations: multilevel, multidimensional, continuous data patterns, etc.
Pattern Mining Techniques
Mining Patterns
Basic Mining Techniques
Pattern growth (e.g., FP-Growth, HMine).
Candidate generation (e.g., Apriori, EClat, CHARM).
Interestingness Measures
Subjective vs. objective measures.
Evaluation of correlation rules, distributed and parallel mining.
Extended Patterns
Types of Extended Patterns
Spatial Patterns: Co-location patterns.
Temporal Patterns: Evolutionary and periodic patterns.
Network Patterns: Applications include semantic data compression and collaborative filtering.
Association Rule Details
Rule Components
Itemset Definition: An itemset X={x1,...,xk} is considered in a set of database transactions.
Transaction Identifier: Each transaction T is identified by TID, containing a set of items.
Rule Formation:
Support
Probability that a transaction contains itemset X ∪ Y.
Confidence
Conditional probability that if X occurs, Y also occurs.
Support and Confidence Example
Minimum Support: 50%
Results:
A → C (Support: 50%, Confidence: 66.7%)
C → A (Support: 50%, Confidence: 100%)
Rule Evaluation Criteria
Based on Value Types
Boolean Association Rule: Presence/absence tracking.
Quantitative Association Rule: Partitioning of quantitative values into intervals.
Based on Dimensions
Single vs. Multi-dimensional Rules:
Single: Age(X, "30..39") ^ Income(X, "42K…48K") =>buys(X, "LCD TV")
Multi: behaves similarly but across multiple dimensions.
Redundancy in Rules
Some rules can be redundant due to ancestor relationships e.g., milk → wheat bread.
Frequent Itemsets and Pruning
Apriori Principle: Any subset of a frequent itemset must also be frequent.
Efficiency: Generate longer candidate itemsets from length k frequent itemsets, using pruning to improve performance.
FP-Growth Algorithm Overview
Key Points
Fewer Database Scans
No Candidate Generation: Avoids the bottleneck associated with candidate generation and test.
Construction of FP-tree: Database representation of frequent patterns into FP-tree structure for efficient mining.
Compression of Database: Helps retain association information while reducing the volume, ensuring no loss of frequent pattern data.
Conclusion
Applications and Extensions: Frequent pattern analysis has broad applications in market basket analysis, classification, clustering, and beyond, effectively employing various algorithms and techniques to derive meaningful patterns from vast datasets.