Methods Lexical Semantics I & II

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/45

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

46 Terms

New cards

Lexical Semantics

The study of...

▶ what individual lexical items mean,

▶ how we can represent their meaning,

▶ and how to combine the meaning of individual items to obtain an interpretation for a phrase/utterance

New cards

Lexical Semantics in Computational Linguistics

▶ Recognize word senses in text (manually and automatically)

▶ Define similarities between words

▶ Determine how strongly a verb “goes with” its subject (selectional preferences)

▶ Recognize and interpret figurative uses of words

▶ Describe relations between words (or better, between word senses)

New cards

Semantic Ontologies

structured dictionaries that define word senses and relation to other word senses

New cards

WordNet

large lexical resource that organizes words and synsets according to their semantic relations

New cards

Limitations of Relational Models

▶ Relational models such as WordNet are glorified thesauri

▶ Require many years of development and depend on skilled lexicographers

▶ Inconsistencies throughout the resource

▶ Ontology is only as good as ontologist(s) – it is not only data

New cards

Distributional Semantic Model (DSM)

A model that encodes meaning from word co-occurrence patterns.

New cards

Effect of preprocessing

Linguistic annotation changes the nearest neighbors in a distributional model.

New cards

Semantic Similarity

two words sharing a high number of salient features (attributes) → paradigmatic relatedness

New cards

Semantic Relatedness

two words semantically associated without being necessarily similar → syntagmatic relatedness

New cards

Feature scaling

Adjusting feature values (e.g., Logarithmic scaling, Relevance weighting, Statistical association measures) before similarity computation.

New cards

Simple association measures

Pointwise Mutual Information, t-score, Log-Likelihood, Odds Ratio

New cards

Dimensionality reduction

Identify the latent dimensions and project the data onto these new dimensions

New cards

How are the word embeddings created?

▶ give words from a vocabulary as input to a (feed-forward) neural network

▶ embed them as vectors into a lower dimension space of a fixed size

▶ fine-tune through back-propagation

New cards

What is the objective of creating the word embeddings?

create word representations that are good at predicting the surrounding context

New cards

Distributional Representation

▶ captures linguistic distribution of each word in form of a high-dimensional numeric vector

▶ typically based on co-occurrence counts (aka “count” models)

▶ based on distributional hypothesis: similar distribution ≃ similar meaning (similar distribution = similar representation)

New cards

Distributed Representation

▶ sub-symbolic, compact representation of words as dense numeric vector

▶ meaning is captured in different dimensions and it is used to predict words (aka “predict” models)

▶ similarity of vectors corresponds to similarity of the words

▶ aka word embeddings

New cards

Methods to train word embeddings

word2vec, FastText, GloVe, ELMo, BERT, Flair

New cards

FastText

a method similar to word2vec but trained on character n-grams instead of words

New cards

GloVe

first uses co-occurrence matrix, calculates ratios of probabilities; trained with log-bilinear regression model

New cards

ELMo, BERT, Flair

Contextualized word embeddings

New cards

word2vec

▶ takes words from a very large corpus of text as input (unsupervised)

▶ learn a vector representation for each word to predict between every word and its context

▶ fully connected feed-forward neural network with one hidden layer

Two main algorithms:

▶ Continuous Bag of Words (CBOW)

▶ Skip-gram

New cards

Continuous Bag of Words (CBOW)

predicts center word from the given context (sum of surrounding words vectors), uses continuous representations whose order is of no importance, can be seen as a precognitive language model, Objective function similar to a language model.

New cards

Skip-gram

predicts context taking the center word as input, objective function sums the log probabilities of the surrounding n words to the left and to the right of the target word wt

New cards

Embedding models consider…

the history (previous words) and the future (following words) of a center word. The number of words considered is called “the window size”

New cards

Words embeddings have … structure

Words embeddings have linear structure that enables analogies with vector arithmetics

New cards

Variations on word sense analysis

▶ Word Sense Induction: we don’t know what (or even how many) senses the words have

▶ Word Sense Disambiguation (WSD): we have a sense inventory for each word

▶ Entity Linking: like WSD only with entities and (usually) an extra “OTHER” option (because probably not all referents of an entity are known)

New cards

Working Assumptions

▶ coherence

▶ one sense per collocation

▶ one sense per discourse

New cards

Word sense disambiguation

select a sense for a word from a set of predefined possibilities (sense inventory usually comes from a dictionary or thesaurus) - supervised

New cards

Word sense induction

split the usages of a word into different meanings - unsupervised

New cards

WSD / WSI target sets

lexical sample

▶ gather all contexts corresponding to occurrences of a target word

▶ partition these contexts into regions of high density

▶ assign a sense to each region

all words

▶ make a graph consisting of all senses of all words to be disambiguated

▶ choose the best combination of senses

New cards

Approaches to WSD

▶ Knowledge-Based Disambiguation (use external resources and discourse properties)

▶ Supervised Disambiguation (uses labeled data)

▶ Unsupervised Disambiguation (one approach for all targets)

New cards

Describing the context: features

▶ information about the target word’s senses, e. g., definitions, related concepts, unambiguous contexts, ...

▶ information about the words around the target word

▶ frequently cooccurring words

▶ words that cooccur only with particular senses

▶ selectional preferences (e. g., drink (with the “ingest” sense) takes liquids as objects)

▶ words, root forms/lemmas, POS, frequency, ...

New cards

WSD with definitions

Identify the correct senses using definitions overlap

New cards

How to find the optimal sense combination for WSD with definitions?

Find the correct senses one at a time or Simulated annealing (function f = combination of word senses in a given text, Find the combination of senses that leads to highest definition overlap (redundancy))

New cards

WSD with a similarity graph

1. For each open-class word gather all word senses

2. Compute pairwise sense similarities with one of the similarity metrics (e. g., if we use WordNet senses, use graph-based similarity on WordNet)

3. Find the “best” combination of senses

New cards

Unsupervised WSD goal

assign a word sense from an inventory but without training data

New cards

Unsupervided WSI goal

cluster/group the contexts of ambiguous words, discriminate between these groups without actually labeling them

New cards

WSI clustering types

▶ hierarchical clustering of contexts

▶ clustering by committee

▶ k-means clustering

New cards

hierarchical clustering of contexts

start with one word per cluster, and iteratively merge the clusters

▶ single-link/complete-link/average-link clustering

▶ hierarchical density-based clustering

New cards

clustering by committee

▶ find the top-k most similar words for each word

▶ construct committees as collections of tight clusters using the top-k similar words

▶ form as many committees as possible on the condition that each newly formed committee is not very similar to any existing committee

▶ assign each word to its most similar committee

New cards

LDA (Latent Dirichlet Allocation) a.k.a. Topic Modeling

discovers underlying themes in a collection of words by assigning each word a probability of belonging to different topics

New cards

Word sense induction by graph clustering

For a target word w, we build a collocation graph that connects the words in w’s context. Every edge in the graph represents the similarity between the connected nodes

New cards

mini-cut in graph clustering

find the partition of a graph by cutting the smallest number of edges or the edges with a minimum weighted sum

New cards

Chinese whispers in graph clustering

1. assign a class to each node

2. at each iteration a node gets reassigned the strongest class in the local neighborhood (most connected) ▶ In case of ties, choose a class randomly

New cards

Evaluation

▶ Comparison with a gold standard

▶ Precision / cluster purity = percentage of tokens that are tagged correctly, out of all tokens targeted by the system

▶ Recall / cluster overlap = percentage of tokens that are tagged correctly, out of all words

New cards

Motivation for Multi-Modal Semantics

▶ Semantics requires “grounding”

▶ Semantics across multiple input modalities

▶ Better semantic representations for NLP: Importance for human-like understanding and real-world applications (e. g., image captioning, video retrieval, grounded dialogue)