Multimedia Databases and Data Mining Flashcards

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/23

flashcard set

Earn XP

Description and Tags

Concepts covering multimedia indexing, text retrieval, dimensionality reduction techniques like SVD, and core data mining tasks including classification and clustering.

Last updated 1:58 AM on 7/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

24 Terms

1
New cards

Content-based Image Retrieval

A field of research aiming at indexing and retrieving images based on their visual contents rather than manual text annotation.

2
New cards

Color Histogram

A compact representation of the color of an image where colors are partitioned into kk groups and the percentage of each group in the image is measured.

3
New cards

Stemming

A text processing technique where only the root of each word is kept (e.g., converting 'inverted' and 'inversion' into 'invert').

4
New cards

Inverted Files

A text indexing structure consisting of a dictionary and postings lists, known for high speed despite space overhead.

5
New cards

Zipf Distribution

A distribution observed in text collections where the frequency of a word is approximately inversely proportional to its rank (freq1/rankfreq \sim 1/rank).

6
New cards

Postings Lists

The components of an inverted file that list occurrences of terms in documents; they are identified as the main source of space overhead.

7
New cards

Vector Space Model

A model where each document is represented as a vector of size dd, where dd is the number of different terms in the database (vocabulary size).

8
New cards

Binary Weights

A term weighting scheme where only the presence (1) or absence (0) of a term is included in the document vector.

9
New cards

tf x idf

A weighting measure defined as w=tf×log(N/nk)w = tf \times \log(N/n_k), where tftf is term frequency, NN is the total number of documents, and nkn_k is the number of documents containing the term.

10
New cards

Cosine Coefficient

A similarity measure for document vectors, also known as the normalized inner product, calculated as sim(Di,Dj)=k=1twik×wjk\text{sim}(D_i, D_j) = \sum_{k=1}^t w_{ik} \times w_{jk} for normalized vectors.

11
New cards

Latent Semantic Indexing (LSI)

A method that maps documents and terms into latent (hidden) concepts to improve filtering and retrieval.

12
New cards

Singular Value Decomposition (SVD)

The decomposition of a matrix into A=UΛVTA = U \Lambda V^T, where UU is a document-to-concept matrix, Λ\Lambda is a diagonal matrix of concept strengths, and VV is a term-to-concept matrix.

13
New cards

Frobenius Norm

The norm of an n×mn \times m matrix MM calculated as the square root of the sum of the squares of its elements: M[i,j]2\sqrt{\sum M[i, j]^2}.

14
New cards

Authorities

In Kleinberg's algorithm, these are nodes that receive links from many important hub nodes.

15
New cards

Hubs

In Kleinberg's algorithm, these are nodes that point to many high-quality authority nodes.

16
New cards

PageRank

An algorithm that determines the importance of a page by computing its steady-state probability in a Markov Chain model of a random web surfer.

17
New cards

Isometric Mapping

An embedding where the mapping FF ensures the exact preservation of distance between objects.

18
New cards

FastMap

A metric analogue to the KL-transform (PCA) that uses pivot points and the law of cosines to compute pseudo-projections.

19
New cards

Johnson-Lindenstrauss Lemma

A mathematical basis for random projections, stating that a set of points in high-dimensional space can be mapped to much lower dimensions while approximately preserving distances.

20
New cards

Classification

A data mining task involving learning a function that maps an item into one of a set of predefined classes using a training set.

21
New cards

Regression

A data mining task where a function is learned to map an item to a continuous real value.

22
New cards

Clustering

The process of identifying groups of similar items such that intracluster distances are minimized and intercluster distances are maximized.

23
New cards

Association Rule Discovery

The production of dependency rules that predict the occurrence of an item based on the occurrences of other items (e.g., {Milk}{Coke}\{Milk\} \rightarrow \{Coke\}).

24
New cards

Stratified Sampling

A sampling method that approximates the percentage of each subpopulation of interest in the overall database, often used with skewed data.