IN2110: Språkteknologiske metoder - Vektorrom for språkteknologi

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards based on the lecture notes.

Last updated 6:46 AM on 5/5/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

1
New cards

Feature

Observable and relevant properties of the data, each having a numerical value.

2
New cards

Feature Vector

A tuple of d feature values: x = ⟨x1, x2, . . . , xd⟩ representing an object x.

3
New cards

Vector Space Model

A model where data is represented as feature vectors, with features as dimensions in a space.

4
New cards

Bag-of-Words (BoW)

A document representation where features are frequency counts of words in the text.

5
New cards

Token

An instance of a word in a text.

6
New cards

Type

A unique word in a text.

7
New cards

Co-occurrence Matrix

A matrix where rows represent words and columns represent documents, showing word occurrences.

8
New cards

Distributional Hypothesis

The idea that words which occur in similar contexts are semantically related.

9
New cards

Euclidean Distance

The straight line distance between two points (vectors) in a vector space.

10
New cards

Normalization

The process of scaling vectors to have a unit length (∥x∥ = 1).

11
New cards

Cosine Similarity

A measure of similarity between two vectors based on the cosine of the angle between them.

12
New cards

TF-IDF

A weighting function that combines term frequency (tf) and inverse document frequency (idf).

13
New cards

Term Frequency (TF)

The number of times a term occurs in a document.

14
New cards

Document Frequency (DF)

The number of documents in a collection that contain a term.

15
New cards

Inverse Document Frequency (IDF)

A measure of how rare a term is in a document collection, calculated as idf(ti) = log (N / df(ti)).

16
New cards

Tokenization

Splitting a text into sentences and words or other units.

17
New cards

Lemmatization

Reducing words to their base or dictionary form (lemma).

18
New cards

Stemming

Reducing words to their stem or root form, often by removing suffixes.

19
New cards

Stop-list

A list of common words (function words) to be filtered out during text pre-processing.

20
New cards

Sparsity

A characteristic of high-dimensional vectors with very few non-zero elements.

21
New cards

Classification

A supervised learning task that involves assigning new instances to predefined classes.

22
New cards

Clustering

An unsupervised learning task that involves grouping similar objects together.

23
New cards

Contiguity Hypothesis

Objects in the same class form a contiguous region, and regions of different classes do not overlap.

24
New cards

KNN (K-Nearest Neighbor)

A classification method based on the distances to the nearest neighbors.

25
New cards

Rocchio Classification

A classification method that uses the nearest centroid (mean) of each class.

Explore top notes

note
Electricity
Updated 1070d ago
0.0(0)
note
Ch 16 - Macroeconomic Equilibrium
Updated 1072d ago
0.0(0)
note
General Science: Basic Concepts
Updated 490d ago
0.0(0)
note
Operating System (OS)
Updated 1213d ago
0.0(0)
note
Unit A: Mix of Flow and Matter
Updated 693d ago
0.0(0)
note
DNA
Updated 518d ago
0.0(0)
note
Electricity
Updated 1070d ago
0.0(0)
note
Ch 16 - Macroeconomic Equilibrium
Updated 1072d ago
0.0(0)
note
General Science: Basic Concepts
Updated 490d ago
0.0(0)
note
Operating System (OS)
Updated 1213d ago
0.0(0)
note
Unit A: Mix of Flow and Matter
Updated 693d ago
0.0(0)
note
DNA
Updated 518d ago
0.0(0)

Explore top flashcards

flashcards
Chapter 10
43
Updated 1190d ago
0.0(0)
flashcards
Punjabi Vocab
72
Updated 581d ago
0.0(0)
flashcards
psych exam 5
58
Updated 684d ago
0.0(0)
flashcards
ENG SEMESTER EXAM REVIEW
110
Updated 1184d ago
0.0(0)
flashcards
Lipids and Carbs
26
Updated 1091d ago
0.0(0)
flashcards
U2: Lista completa
128
Updated 917d ago
0.0(0)
flashcards
Chapter 10
43
Updated 1190d ago
0.0(0)
flashcards
Punjabi Vocab
72
Updated 581d ago
0.0(0)
flashcards
psych exam 5
58
Updated 684d ago
0.0(0)
flashcards
ENG SEMESTER EXAM REVIEW
110
Updated 1184d ago
0.0(0)
flashcards
Lipids and Carbs
26
Updated 1091d ago
0.0(0)
flashcards
U2: Lista completa
128
Updated 917d ago
0.0(0)