Language Modeling and Spell Checking

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

language model

A statistical or machine learning model that predicts the probability of a sequence of words in a language.

2
New cards

n-gram

A sequence of n items, where an item can be a letter, digit, word, syllable, or other unit, in particular order used in natural language processing.

3
New cards

unigram

an n-gram that consists of single item from a sequence, often a word

4
New cards

bigram

a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words.

5
New cards

chain rule

a formula used to find the derivative of a composite function. h(x)=f(g(x)), h’(x) = f’(g(x)) * g’(x)

6
New cards

markov assumption

a fundamental concept in probability theory that states the future of a system is conditionally independent of its past, given its present state.

7
New cards

maximum likelihood estimation

A statistical method for estimating the parameters of a model by finding the values that maximize the likelihood of the observed data

8
New cards

extrinsic evaluation

a way of measuring a system’s performance by testing how well it helps with a real-world task or application.

9
New cards

intrinsic evaluation

a way of measuring a system’s performance by directly testing the quality of its output, without using a real-world task.

10
New cards

perplexity

a measurement of how well a language model predicts a sample; lower means the model is better at predicting the text.

11
New cards

sparsity

when most possible word combinations or data entries are missing or have zero counts because there is not enough data to cover everything.

12
New cards

smoothing

A technique used to adjust probability estimates in language models to account for unseen events or rare word combinations.

13
New cards

laplace smoothing

a smoothing method where 1 is added to every word count to avoid zero probabilities for unseen words in a language model

14
New cards

closed vocabulary

a fixed set of words that a language model or system is allowed to recognize; any word outside this set is treated as unknown.

15
New cards

out-of-vocabulary

words that are not included in the system’s fixed vocabulary and are treated as unknown during processing

16
New cards

<UNK> replacement

a method where any out-of-vocabulary (OOV) word is replaced with a special token, to handle unknown words during language processing

17
New cards

subword tokenization

a method that breaks words into smaller units (like prefixes, suffixes, or common parts) to better handle rare or unseen words in language processing

18
New cards

spelling error detection

the task of identifying words in a text that are not spelled correctly

19
New cards

spelling error correction

the task of finding and fixing misspelled words by suggesting or replacing them with the correct spelling

20
New cards

phonetic errors

spelling mistakes that happen because a word is written the way it sounds rather than its correct spelling

21
New cards

run-on errors

mistakes where two or more words are incorrectly written together without spaces, making them harder to read or understand.

22
New cards

split errors

mistakes where one word is incorrectly divided into two separate words

23
New cards

isolated-word error correction

correcting spelling mistakes by looking at each word separately, without considering the surrounding words

24
New cards

context-word dependent word correction

correcting spelling mistakes by using the surrounding words to choose the right correction

25
New cards

minimum edit difference

the smallest number of edits (insertions, deletions, or substitutions) needed to change one word into another.

26
New cards

acyclic graph

a graph that has no cycles, meaning you cannot start at one node and follow a path that leads back to the same node

27
New cards

insertion

an edit operation where a new character is added to a word to help match another word

28
New cards

deletion

an edit operation where a character is removed from a word to help match another word

29
New cards

substitution

an edit operation where one character in a word is replace with a different character to help match another word.

30
New cards

transposition

an edit operation where two adjacent characters are swapped to help match another word

31
New cards

confusion probabilities

the chances that one letter, word, or sound will be mistakenly recognized as another during language processing

32
New cards

local syntactic errors

grammar mistakes that affect only a small part of a sentence, like subject-verb agreement or word order

33
New cards

long-distance syntactic errors

grammar mistakes that happen when words that should agree are far apart in a sentence, making the error harder to spot.