1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
language model
A statistical or machine learning model that predicts the probability of a sequence of words in a language.
n-gram
A sequence of n items, where an item can be a letter, digit, word, syllable, or other unit, in particular order used in natural language processing.
unigram
an n-gram that consists of single item from a sequence, often a word
bigram
a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words.
chain rule
a formula used to find the derivative of a composite function. h(x)=f(g(x)), h’(x) = f’(g(x)) * g’(x)
markov assumption
a fundamental concept in probability theory that states the future of a system is conditionally independent of its past, given its present state.
maximum likelihood estimation
A statistical method for estimating the parameters of a model by finding the values that maximize the likelihood of the observed data
extrinsic evaluation
a way of measuring a system’s performance by testing how well it helps with a real-world task or application.
intrinsic evaluation
a way of measuring a system’s performance by directly testing the quality of its output, without using a real-world task.
perplexity
a measurement of how well a language model predicts a sample; lower means the model is better at predicting the text.
sparsity
when most possible word combinations or data entries are missing or have zero counts because there is not enough data to cover everything.
smoothing
A technique used to adjust probability estimates in language models to account for unseen events or rare word combinations.
laplace smoothing
a smoothing method where 1 is added to every word count to avoid zero probabilities for unseen words in a language model
closed vocabulary
a fixed set of words that a language model or system is allowed to recognize; any word outside this set is treated as unknown.
out-of-vocabulary
words that are not included in the system’s fixed vocabulary and are treated as unknown during processing
<UNK> replacement
a method where any out-of-vocabulary (OOV) word is replaced with a special token, to handle unknown words during language processing
subword tokenization
a method that breaks words into smaller units (like prefixes, suffixes, or common parts) to better handle rare or unseen words in language processing
spelling error detection
the task of identifying words in a text that are not spelled correctly
spelling error correction
the task of finding and fixing misspelled words by suggesting or replacing them with the correct spelling
phonetic errors
spelling mistakes that happen because a word is written the way it sounds rather than its correct spelling
run-on errors
mistakes where two or more words are incorrectly written together without spaces, making them harder to read or understand.
split errors
mistakes where one word is incorrectly divided into two separate words
isolated-word error correction
correcting spelling mistakes by looking at each word separately, without considering the surrounding words
context-word dependent word correction
correcting spelling mistakes by using the surrounding words to choose the right correction
minimum edit difference
the smallest number of edits (insertions, deletions, or substitutions) needed to change one word into another.
acyclic graph
a graph that has no cycles, meaning you cannot start at one node and follow a path that leads back to the same node
insertion
an edit operation where a new character is added to a word to help match another word
deletion
an edit operation where a character is removed from a word to help match another word
substitution
an edit operation where one character in a word is replace with a different character to help match another word.
transposition
an edit operation where two adjacent characters are swapped to help match another word
confusion probabilities
the chances that one letter, word, or sound will be mistakenly recognized as another during language processing
local syntactic errors
grammar mistakes that affect only a small part of a sentence, like subject-verb agreement or word order
long-distance syntactic errors
grammar mistakes that happen when words that should agree are far apart in a sentence, making the error harder to spot.