IN2110: Språkteknologiske metoder - Vektorrom for språkteknologi

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards based on the lecture notes.

Last updated 6:46 AM on 5/5/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

1
New cards

Feature

Observable and relevant properties of the data, each having a numerical value.

2
New cards

Feature Vector

A tuple of d feature values: x = ⟨x1, x2, . . . , xd⟩ representing an object x.

3
New cards

Vector Space Model

A model where data is represented as feature vectors, with features as dimensions in a space.

4
New cards

Bag-of-Words (BoW)

A document representation where features are frequency counts of words in the text.

5
New cards

Token

An instance of a word in a text.

6
New cards

Type

A unique word in a text.

7
New cards

Co-occurrence Matrix

A matrix where rows represent words and columns represent documents, showing word occurrences.

8
New cards

Distributional Hypothesis

The idea that words which occur in similar contexts are semantically related.

9
New cards

Euclidean Distance

The straight line distance between two points (vectors) in a vector space.

10
New cards

Normalization

The process of scaling vectors to have a unit length (∥x∥ = 1).

11
New cards

Cosine Similarity

A measure of similarity between two vectors based on the cosine of the angle between them.

12
New cards

TF-IDF

A weighting function that combines term frequency (tf) and inverse document frequency (idf).

13
New cards

Term Frequency (TF)

The number of times a term occurs in a document.

14
New cards

Document Frequency (DF)

The number of documents in a collection that contain a term.

15
New cards

Inverse Document Frequency (IDF)

A measure of how rare a term is in a document collection, calculated as idf(ti) = log (N / df(ti)).

16
New cards

Tokenization

Splitting a text into sentences and words or other units.

17
New cards

Lemmatization

Reducing words to their base or dictionary form (lemma).

18
New cards

Stemming

Reducing words to their stem or root form, often by removing suffixes.

19
New cards

Stop-list

A list of common words (function words) to be filtered out during text pre-processing.

20
New cards

Sparsity

A characteristic of high-dimensional vectors with very few non-zero elements.

21
New cards

Classification

A supervised learning task that involves assigning new instances to predefined classes.

22
New cards

Clustering

An unsupervised learning task that involves grouping similar objects together.

23
New cards

Contiguity Hypothesis

Objects in the same class form a contiguous region, and regions of different classes do not overlap.

24
New cards

KNN (K-Nearest Neighbor)

A classification method based on the distances to the nearest neighbors.

25
New cards

Rocchio Classification

A classification method that uses the nearest centroid (mean) of each class.

Explore top notes

note
Chapter 18: Fires and Explosives
Updated 1088d ago
0.0(0)
note
Archeology and Anthropology
Updated 1251d ago
0.0(0)
note
Chapter 9: Lifespan Development
Updated 1285d ago
0.0(0)
note
Music Innovators
Updated 1345d ago
0.0(0)
note
CGO casus 6
Updated 434d ago
0.0(0)
note
Chapter 1: The Earth in Context
Updated 723d ago
0.0(0)
note
Chapter 18: Fires and Explosives
Updated 1088d ago
0.0(0)
note
Archeology and Anthropology
Updated 1251d ago
0.0(0)
note
Chapter 9: Lifespan Development
Updated 1285d ago
0.0(0)
note
Music Innovators
Updated 1345d ago
0.0(0)
note
CGO casus 6
Updated 434d ago
0.0(0)
note
Chapter 1: The Earth in Context
Updated 723d ago
0.0(0)

Explore top flashcards

flashcards
Spanish 2: Stem changing verbs
35
Updated 957d ago
0.0(0)
flashcards
HOSA THING
155
Updated 1125d ago
0.0(0)
flashcards
ANAT2 - Muscles + Attachments
51
Updated 488d ago
0.0(0)
flashcards
Sev + Haz Midterm
48
Updated 534d ago
0.0(0)
flashcards
Latin - Chapter 1 Vocab
35
Updated 863d ago
0.0(0)
flashcards
US GOV
128
Updated 864d ago
0.0(0)
flashcards
Yr 10 French KO 6 (technology)
104
Updated 314d ago
0.0(0)
flashcards
Spanish 2: Stem changing verbs
35
Updated 957d ago
0.0(0)
flashcards
HOSA THING
155
Updated 1125d ago
0.0(0)
flashcards
ANAT2 - Muscles + Attachments
51
Updated 488d ago
0.0(0)
flashcards
Sev + Haz Midterm
48
Updated 534d ago
0.0(0)
flashcards
Latin - Chapter 1 Vocab
35
Updated 863d ago
0.0(0)
flashcards
US GOV
128
Updated 864d ago
0.0(0)
flashcards
Yr 10 French KO 6 (technology)
104
Updated 314d ago
0.0(0)