Language Modeling and Spell Checking

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/32

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

33 Terms

New cards

language model

A statistical or machine learning model that predicts the probability of a sequence of words in a language.

New cards

n-gram

A sequence of n items, where an item can be a letter, digit, word, syllable, or other unit, in particular order used in natural language processing.

New cards

unigram

an n-gram that consists of single item from a sequence, often a word

New cards

bigram

a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words.

New cards

chain rule

a formula used to find the derivative of a composite function. h(x)=f(g(x)), h’(x) = f’(g(x)) * g’(x)

New cards

markov assumption

a fundamental concept in probability theory that states the future of a system is conditionally independent of its past, given its present state.

New cards

maximum likelihood estimation

A statistical method for estimating the parameters of a model by finding the values that maximize the likelihood of the observed data

New cards

extrinsic evaluation

a way of measuring a system’s performance by testing how well it helps with a real-world task or application.

New cards

intrinsic evaluation

a way of measuring a system’s performance by directly testing the quality of its output, without using a real-world task.

New cards

perplexity

a measurement of how well a language model predicts a sample; lower means the model is better at predicting the text.

New cards

sparsity

when most possible word combinations or data entries are missing or have zero counts because there is not enough data to cover everything.

New cards

smoothing

A technique used to adjust probability estimates in language models to account for unseen events or rare word combinations.

New cards

laplace smoothing

a smoothing method where 1 is added to every word count to avoid zero probabilities for unseen words in a language model

New cards

closed vocabulary

a fixed set of words that a language model or system is allowed to recognize; any word outside this set is treated as unknown.

New cards

out-of-vocabulary

words that are not included in the system’s fixed vocabulary and are treated as unknown during processing

New cards

<UNK> replacement

a method where any out-of-vocabulary (OOV) word is replaced with a special token, to handle unknown words during language processing

New cards

subword tokenization

a method that breaks words into smaller units (like prefixes, suffixes, or common parts) to better handle rare or unseen words in language processing

New cards

spelling error detection

the task of identifying words in a text that are not spelled correctly

New cards

spelling error correction

the task of finding and fixing misspelled words by suggesting or replacing them with the correct spelling

New cards

phonetic errors

spelling mistakes that happen because a word is written the way it sounds rather than its correct spelling

New cards

run-on errors

mistakes where two or more words are incorrectly written together without spaces, making them harder to read or understand.

New cards

split errors

mistakes where one word is incorrectly divided into two separate words

New cards

isolated-word error correction

correcting spelling mistakes by looking at each word separately, without considering the surrounding words

New cards

context-word dependent word correction

correcting spelling mistakes by using the surrounding words to choose the right correction

New cards

minimum edit difference

the smallest number of edits (insertions, deletions, or substitutions) needed to change one word into another.

New cards

acyclic graph

a graph that has no cycles, meaning you cannot start at one node and follow a path that leads back to the same node

New cards

insertion

an edit operation where a new character is added to a word to help match another word

New cards

deletion

an edit operation where a character is removed from a word to help match another word

New cards

substitution

an edit operation where one character in a word is replace with a different character to help match another word.

New cards

transposition

an edit operation where two adjacent characters are swapped to help match another word

New cards

confusion probabilities

the chances that one letter, word, or sound will be mistakenly recognized as another during language processing

New cards

local syntactic errors

grammar mistakes that affect only a small part of a sentence, like subject-verb agreement or word order

New cards

long-distance syntactic errors

grammar mistakes that happen when words that should agree are far apart in a sentence, making the error harder to spot.

Explore top notes

hhd unit 3 outcome 2

Updated 785d ago

Note

Lecture_21_-_Population_Ecology_2024

Updated 204d ago

Note

GOV'T VOCAB 5&6

Updated 996d ago

Note

Chapter 10: Acids and Bases and Equilibrium

Updated 892d ago

Note

AP GOV Unit 1

Updated 155d ago

Note

Period 2: 1607–1754: Patterns of Empire and Resistance

Updated 826d ago

Note

skeletal system - dc anatomy & physiology

Updated 279d ago

Note

Unit 8: 20th-Century Global Conflicts

Updated 826d ago

Note

Explore top flashcards

biology 3201 unit 1 test 1 cellular reproduction

Updated 626d ago

Flashcards (82)

Farmowe 3

Updated 458d ago

Flashcards (207)

APUSH Time Period 8A Vocab finished

Updated 475d ago

Flashcards (28)

bio b8.1 - gas exchange and respiration

Updated 807d ago

Flashcards (23)

Cardiovascular Physiology

Updated 128d ago

Flashcards (33)

Exam 3 Pharmacology Review

Updated 167d ago

Flashcards (122)

Második vilaghaborù

Updated 531d ago

Flashcards (96)

moyer lit terms 25-26

Updated 54d ago

Flashcards (76)