OPT READING Differentiable Pattern Set Mining

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/40

Earn XP

Description and Tags

These vocabulary flashcards summarize essential concepts, components, algorithms, and theoretical notions introduced in the lecture on Differentiable Pattern Set Mining and the BinaPs model. Reviewing them will reinforce understanding of the paper’s terminology, methodology, and context.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

41 Terms

New cards

Pattern Mining

The task of discovering interpretable symbolic statements (patterns) that reveal structure in data.

New cards

Frequent Pattern Mining

Classical approach that returns all patterns whose frequency exceeds a user-defined threshold, often producing many redundant or spurious results.

New cards

Pattern Set Mining

Modern reformulation that seeks a small set of patterns whose combination describes the data well.

New cards

Twice-Exponential Search Space

The size of the candidate space when enumerating all possible pattern sets; it grows doubly exponentially with the number of features.

New cards

Heuristic Search

A non-exhaustive strategy used by many pattern-set miners to explore an intractably large search space.

New cards

Autoencoder

A neural network that encodes inputs into a latent representation and then reconstructs the original data from that representation.

New cards

Binary Pattern Networks (BinaPs)

A novel binarized autoencoder that learns sets of conjunctive patterns through binary activations and weights.

New cards

Binary Activation Function

An activation that outputs 0 or 1, enabling neuron outputs to be interpreted as presence or absence of a pattern.

New cards

Binarized Weights

Weights constrained to {0,1} (during forward passes) so each weight can be interpreted as item-in-pattern membership.

New cards

Continuous Weights

Real-valued parameters updated during back-propagation; they are stochastically binarized before each forward pass.

New cards

Conjunctive Pattern

A set of features that must co-occur; in BinaPs each hidden neuron represents one such conjunction.

New cards

Pattern Layer (Hidden Layer)

The single linear hidden layer in BinaPs whose neurons correspond directly to learned patterns.

New cards

Mirrored Weight Matrix

Design choice where the decoder weight matrix equals the encoder matrix transposed, ensuring symmetric reconstruction.

New cards

Bias Term (BinaPs)

Learnable negative threshold that enforces a minimum number of items before a hidden neuron fires.

New cards

Reconstruction Loss

Objective measuring how accurately the autoencoder recreates the input; minimizing it drives pattern discovery.

New cards

Data-Sparsity-Aware Loss

A weighted reconstruction loss that compensates for the imbalance between many 0s and few 1s in sparse data.

New cards

Gradient-Based Optimization

Learning approach that updates parameters via gradients (e.g., Adam) instead of combinatorial search.

New cards

Straight-Through Estimator (STE)

Technique that approximates the gradient of a binary activation by passing the upstream gradient unchanged.

New cards

Gated Straight-Through Estimator

Modified STE used in BinaPs that blocks negative gradients when the corresponding neuron was inactive.

New cards

Stochastic Binarization

Process of converting continuous weights to 0/1 by treating each value as Bernoulli success probability.

New cards

Capacity (c)

Upper bound on the number of hidden neurons (patterns) in a BinaPs model.

New cards

NP-Hard

Complexity class indicating a problem is at least as hard as the hardest problems in NP; deciding optimal pattern-set size is NP-hard.

New cards

Bipartite Dimension

Minimum number of bicliques needed to cover all edges of a bipartite graph; its computation is NP-hard and relates to optimal pattern count.

New cards

Boolean Matrix Factorization (BMF)

Decomposition of a binary matrix into two Boolean factors; closely related to mining pattern sets.

New cards

Tiling

Covering a binary matrix with non-overlapping rectangles (tiles) representing patterns that explain the data.

New cards

Minimum Description Length (MDL) Principle

Information-theoretic framework that selects models yielding the shortest combined encoded length of model plus data.

New cards

Maximum Entropy Modeling

Approach that selects the distribution with highest entropy subject to constraints, used to evaluate pattern interestingness.

New cards

Sparse Data

Dataset in which 1-entries (true events) are rare compared to 0-entries; common in transaction databases.

New cards

Dense Data

Dataset where 1-entries are frequent; e.g., genomic copy-number matrices.

New cards

Signal-to-Noise Ratio

Proportion between true pattern occurrences (signal) and random flips or errors (noise).

New cards

Local Minimum

A parameter setting where loss cannot be reduced by small moves but is not the global optimum.

New cards

Clamp Function

Utility function that bounds a value within a specified range [a,b].

New cards

Bernoulli Distribution

Probability distribution over outcomes {0,1}; used for stochastic weight binarization.

New cards

Biclique

Complete bipartite subgraph; corresponds to a perfect pattern covering a subset of items and transactions.

New cards

Adam Optimizer

Adaptive stochastic gradient descent method commonly used to train neural networks.

New cards

GPU Acceleration

Leveraging graphics processing units to perform parallel computations, drastically speeding up BinaPs training.

New cards

Escaping Poor Minima

Benefit of stochastic binarization which allows occasional random weight changes, preventing dead neurons.

New cards

Asso Algorithm

Boolean matrix factorization method that iteratively adds patterns based on association accuracy.

New cards

Slim Algorithm

MDL-based pattern miner that greedily builds a code table to compress data; may overfit on noisy data.

New cards

Desc Algorithm

Pattern-set miner that uses Maximum Entropy models to select patterns explaining data distributions.

New cards

Synthetic Data

Artificially generated dataset used to evaluate algorithms under controlled conditions (e.g., known ground truth).

Explore top notes

Chapter_044muscosketel

Updated 191d ago

Note

IB Computer - Abstract Data Types

Updated 402d ago

Note

VTV casus 1

Updated 200d ago

Note

Chapter 4: Canada's Population in a Global Context

Updated 930d ago

Note

Reye's syndrome

Updated 905d ago

Note

Chapter 11:Nuclear Chemistry

Updated 1032d ago

Note

Biology Final Exam - Sem 1 - Grade 7

Updated 985d ago

Note

Romeo and Juliet Context: Love, Marriage and Gender

Updated 516d ago

Note

Explore top flashcards

Micro Exam #3

Updated 636d ago

Flashcards (124)

Psychology Research Methods - KO Full

Updated 993d ago

Flashcards (70)

Outreach Plan + Business Plan

Updated 158d ago

Flashcards (24)

Class 2: Drug Therapy

Updated 703d ago

Flashcards (103)

PSY 350 Chapter 1

Updated 328d ago

Flashcards (65)

Latin 1, Humanus Corpus (Human Body)

Updated 916d ago

Flashcards (23)

Materiales / Objetos de la Clase

Updated 698d ago

Flashcards (36)

history midterm prep

Updated 979d ago

Flashcards (159)