Li3 Computational linguistics

0.0(0)

Studied by 6 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/13

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

14 Terms

New cards

Natural language processing

Construction of language models for use in computational tasks and applications

New cards

Computational cognitive lingustics

Constructed language models to further our understanding of the cognition of language

New cards

Turing test

A test designed to determine whether a machine can exhibit intelligent behaviour equivalent to that of a human

New cards

Probabilistic language models

Models designed to assign a probability to a word sequence, for applications such as spell correction and speech recognition

New cards

Maximum likelihood ratio

Allows us to estimate probability using counts from our corpus

New cards

N-gram model

A probabilistic model that predicts the next word in a sentence given the n-1 preceding words in context, i.e., 1-grams (unigrams) predict words based on 0 words of context, 2-grams (bigrams) predict words based on 1 word in context, etc.

New cards

Calculation for probability of n-gram models

Count of given word in specified sentence position / count of all in that sentence position

New cards

Extrinsic evaluation

When each model is put into a task and tested on real-world data. This method is realistic and best for comparing models, though can be expensive

New cards

Intrinsic evaluation

Application-independent evaluations that often correlate with improvements in applications. This is less realistic, though typically cheaper. Data is split into training and test data, and the model is assessed for its ability to ‘predict’ the test data

New cards

Artificial neural network

Model inspired by the structure and function of the human brain. They receive raw data (as input), perform computations through weights and activations, and give a final prediction (output)

New cards

Calculation for activations of neural networks

X₁ (input)* W₁ + X₂ * W₂ etc.

New cards

Word2vec

Represents words as vectors (series of numbers). A shallow feed-forward neural network trained to predict context words and capture semantic and syntactic relations between words

New cards

Corpora

Large-scale samples of text in a language of interest. These have become imperative for training and testing language models. Often include markup (encodings provided by human analysts)

New cards

Supervised method of learning

Training LMs on labeled data, in which each piece of input has corresponding output that the model should predict