Li3 Computational linguistics

0.0(0)
studied byStudied by 6 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

Natural language processing

Construction of language models for use in computational tasks and applications

2
New cards

Computational cognitive lingustics

Constructed language models to further our understanding of the cognition of language

3
New cards

Turing test

A test designed to determine whether a machine can exhibit intelligent behaviour equivalent to that of a human

4
New cards

Probabilistic language models

Models designed to assign a probability to a word sequence, for applications such as spell correction and speech recognition

5
New cards

Maximum likelihood ratio

Allows us to estimate probability using counts from our corpus

6
New cards

N-gram model

A probabilistic model that predicts the next word in a sentence given the n-1 preceding words in context, i.e., 1-grams (unigrams) predict words based on 0 words of context, 2-grams (bigrams) predict words based on 1 word in context, etc.

7
New cards

Calculation for probability of n-gram models

Count of given word in specified sentence position / count of all in that sentence position

8
New cards

Extrinsic evaluation

When each model is put into a task and tested on real-world data. This method is realistic and best for comparing models, though can be expensive

9
New cards

Intrinsic evaluation

Application-independent evaluations that often correlate with improvements in applications. This is less realistic, though typically cheaper. Data is split into training and test data, and the model is assessed for its ability to ‘predict’ the test data

10
New cards

Artificial neural network

Model inspired by the structure and function of the human brain. They receive raw data (as input), perform computations through weights and activations, and give a final prediction (output)

11
New cards

Calculation for activations of neural networks

X1 (input)* W1 + X2 * W2 etc.

12
New cards

Word2vec

Represents words as vectors (series of numbers). A shallow feed-forward neural network trained to predict context words and capture semantic and syntactic relations between words

13
New cards

Corpora

Large-scale samples of text in a language of interest. These have become imperative for training and testing language models. Often include markup (encodings provided by human analysts)

14
New cards

Supervised method of learning

Training LMs on labeled data, in which each piece of input has corresponding output that the model should predict