OPT READING Sets of Robust Rules and the Grab Algorithm – Vocabulary

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/39

Earn XP

Description and Tags

Key vocabulary from the lecture on MDL-based mining of robust association rules and the Grab algorithm.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

40 Terms

New cards

Association Rule

An implication of the form X → Y where X (antecedent) and Y (consequent) are disjoint itemsets, expressing that Y tends to occur when X occurs.

New cards

Antecedent (Head)

The itemset X on the left‐hand side of an association rule X → Y.

New cards

Consequent (Tail)

The itemset Y on the right‐hand side of an association rule X → Y.

New cards

Support

The number of transactions in which a given itemset or rule occurs; denoted |T_X| for itemset X.

New cards

Confidence

A traditional quality measure for rules, defined as support(XY)/support(X).

New cards

Pattern

A special rule whose antecedent is empty (∅ → Y), describing unconditional co-occurrence of Y.

New cards

Singleton Rule

A rule whose consequent contains exactly one item, e.g., ∅ → A.

New cards

Noise-Resistant Rule

A rule allowed to hold approximately, even if some items of its consequent are missing in a transaction.

New cards

Minimum Description Length (MDL) Principle

Information-theoretic framework that selects the model giving the shortest total code length for describing model plus data.

New cards

One-Part (Refined) MDL

MDL variant using a single code L(D | M) to describe data with the whole model class, offering strong optimality but often infeasible.

New cards

Two-Part (Crude) MDL

Practical MDL variant using L(M)+L(D | M), i.e., separate costs for model description and for encoding data given the model.

New cards

Codelength

The number of bits required to encode an object (model or data) under a chosen coding scheme.

New cards

Model Cost L(M)

The portion of the MDL score that encodes the rule set itself, including heads, tails and parameters.

New cards

Data Cost L(D | M)

Bits needed to transmit the database when the recipient already knows the model M.

New cards

Parametric Complexity (Lpc)

The regret term of the Normalised Maximum Likelihood (NML) code used to encode model parameters optimally.

New cards

Grab Algorithm

A greedy, any-time heuristic that starts from singleton rules and iteratively refines the rule set to minimise MDL codelength.

New cards

Candidate Generation (Grab)

Phase where Grab creates new rules by merging heads or tails of existing rules, ensuring the dependency graph stays acyclic.

New cards

Gain ΔL

Change in total description length when replacing the current model by a refined model; negative ΔL indicates improvement.

New cards

Gain Estimation (Δ̂₁, Δ̂₂)

Fast first-tier and tighter second-tier approximations used by Grab to avoid expensive exact ΔL evaluations.

New cards

Cover Procedure

Subroutine that determines, for a candidate rule set, which transactions each rule explains and how errors are split.

New cards

Split Point k*

Optimal threshold (number of tail items present) that minimises MDL cost when deciding whether a rule holds in a transaction.

New cards

Tid-List (Transaction ID List)

Set of transaction identifiers containing a given itemset; used for fast support and intersection operations.

New cards

Directed Acyclic Graph (Rule Graph)

Representation of rule dependencies where vertices are rules and edges link overlapping heads/tails; must be acyclic to decode data.

New cards

Submodularity (Not satisfied)

Property of set functions with diminishing returns; the MDL score for rule sets is proven NOT to be submodular.

New cards

Monotonicity (Not satisfied)

Property where adding elements never worsens the score; MDL score for rule sets is NOT monotonically decreasing.

New cards

Greedy Heuristic

Search strategy that builds a solution step by step using locally optimal choices; Grab employs this to approximate the MDL optimum.

New cards

Pattern Set Mining

Framework evaluating the quality of a whole set of patterns rather than individual ones; Grab is an example.

New cards

Hyper+

Noise-resistant pattern miner used as a baseline in the experiments; tends to output large numbers of patterns.

New cards

Kingfisher

State-of-the-art algorithm for statistically significant dependency rules (single-item consequent) based on Fisher’s exact test.

New cards

Pack

MDL-based method that builds decision trees per item; tree paths can be interpreted as rules.

New cards

Synthetic Data

Artificially generated datasets used to test whether algorithms can recover known planted rules.

New cards

Real-World Data

Empirical datasets (e.g., Mushrooms, Mammals) used to assess practical usefulness of the mined rule sets.

New cards

Error Matrix

Binary matrix indicating, for transactions where a rule applies, which consequent items are missing (destructive) or present (additive).

New cards

Transaction

A single row in a binary dataset, typically representing items bought, attributes present, etc.

New cards

Itemset

A subset of items (features) from the alphabet I.

New cards

Dependency Graph

Graph capturing overlaps among rules; its acyclicity guarantees lossless decoding under Grab’s model.

New cards

Evaluation Order

Sequence in which candidate rules are tested; Grab prioritises those with lower estimated ΔL.

New cards

Rule Set Size |R|

Number of non-singleton rules included in a model; MDL penalises large |R| to avoid overfitting.

New cards

Information Theory

Field providing the mathematical basis (entropy, coding) underpinning MDL and compression-based model selection.

New cards

Frequent Itemset Mining

Task of enumerating all itemsets with support above a threshold; classic precursor to rule mining but often produces huge outputs.