1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Natural language processing
Construction of language models for use in computational tasks and applications
Computational cognitive lingustics
Constructed language models to further our understanding of the cognition of language
Turing test
A test designed to determine whether a machine can exhibit intelligent behaviour equivalent to that of a human
Probabilistic language models
Models designed to assign a probability to a word sequence, for applications such as spell correction and speech recognition
Maximum likelihood ratio
Allows us to estimate probability using counts from our corpus
N-gram model
A probabilistic model that predicts the next word in a sentence given the n-1 preceding words in context, i.e., 1-grams (unigrams) predict words based on 0 words of context, 2-grams (bigrams) predict words based on 1 word in context, etc.
Calculation for probability of n-gram models
Count of given word in specified sentence position / count of all in that sentence position
Extrinsic evaluation
When each model is put into a task and tested on real-world data. This method is realistic and best for comparing models, though can be expensive
Intrinsic evaluation
Application-independent evaluations that often correlate with improvements in applications. This is less realistic, though typically cheaper. Data is split into training and test data, and the model is assessed for its ability to ‘predict’ the test data
Artificial neural network
Model inspired by the structure and function of the human brain. They receive raw data (as input), perform computations through weights and activations, and give a final prediction (output)
Calculation for activations of neural networks
X1 (input)* W1 + X2 * W2 etc.
Word2vec
Represents words as vectors (series of numbers). A shallow feed-forward neural network trained to predict context words and capture semantic and syntactic relations between words
Corpora
Large-scale samples of text in a language of interest. These have become imperative for training and testing language models. Often include markup (encodings provided by human analysts)
Supervised method of learning
Training LMs on labeled data, in which each piece of input has corresponding output that the model should predict