Multimedia Databases and Data Mining Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/23

Earn XP

Description and Tags

Concepts covering multimedia indexing, text retrieval, dimensionality reduction techniques like SVD, and core data mining tasks including classification and clustering.

Last updated 1:58 AM on 7/3/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

24 Terms

New cards

Content-based Image Retrieval

A field of research aiming at indexing and retrieving images based on their visual contents rather than manual text annotation.

New cards

Color Histogram

A compact representation of the color of an image where colors are partitioned into $k$ groups and the percentage of each group in the image is measured.

New cards

Stemming

A text processing technique where only the root of each word is kept (e.g., converting 'inverted' and 'inversion' into 'invert').

New cards

Inverted Files

A text indexing structure consisting of a dictionary and postings lists, known for high speed despite space overhead.

New cards

Zipf Distribution

A distribution observed in text collections where the frequency of a word is approximately inversely proportional to its rank ( $freq \sim 1/rank$ ).

New cards

Postings Lists

The components of an inverted file that list occurrences of terms in documents; they are identified as the main source of space overhead.

New cards

Vector Space Model

A model where each document is represented as a vector of size $d$ , where $d$ is the number of different terms in the database (vocabulary size).

New cards

Binary Weights

A term weighting scheme where only the presence (1) or absence (0) of a term is included in the document vector.

New cards

tf x idf

A weighting measure defined as $w = tf \times \log(N/n_k)$ , where $tf$ is term frequency, $N$ is the total number of documents, and $n_k$ is the number of documents containing the term.

New cards

Cosine Coefficient

A similarity measure for document vectors, also known as the normalized inner product, calculated as $\text{sim}(D_i, D_j) = \sum_{k=1}^t w_{ik} \times w_{jk}$ for normalized vectors.

New cards

Latent Semantic Indexing (LSI)

A method that maps documents and terms into latent (hidden) concepts to improve filtering and retrieval.

New cards

Singular Value Decomposition (SVD)

The decomposition of a matrix into $A = U \Lambda V^T$ , where $U$ is a document-to-concept matrix, $\Lambda$ is a diagonal matrix of concept strengths, and $V$ is a term-to-concept matrix.

New cards

Frobenius Norm

The norm of an $n \times m$ matrix $M$ calculated as the square root of the sum of the squares of its elements: $\sqrt{\sum M[i, j]^2}$ .

New cards

Authorities

In Kleinberg's algorithm, these are nodes that receive links from many important hub nodes.

New cards

Hubs

In Kleinberg's algorithm, these are nodes that point to many high-quality authority nodes.

New cards

PageRank

An algorithm that determines the importance of a page by computing its steady-state probability in a Markov Chain model of a random web surfer.

New cards

Isometric Mapping

An embedding where the mapping $F$ ensures the exact preservation of distance between objects.

New cards

FastMap

A metric analogue to the KL-transform (PCA) that uses pivot points and the law of cosines to compute pseudo-projections.

New cards

Johnson-Lindenstrauss Lemma

A mathematical basis for random projections, stating that a set of points in high-dimensional space can be mapped to much lower dimensions while approximately preserving distances.

New cards

Classification

A data mining task involving learning a function that maps an item into one of a set of predefined classes using a training set.

New cards

Regression

A data mining task where a function is learned to map an item to a continuous real value.

New cards

Clustering

The process of identifying groups of similar items such that intracluster distances are minimized and intercluster distances are maximized.

New cards

Association Rule Discovery

The production of dependency rules that predict the occurrence of an item based on the occurrences of other items (e.g., $\{Milk\} \rightarrow \{Coke\}$ ).

New cards

Stratified Sampling

A sampling method that approximates the percentage of each subpopulation of interest in the overall database, often used with skewed data.