1/10
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
define computational linguistics
study of written and spoken language from a computational perspective, building artifacts that usefully process and produce language
what is the goal of computational linguistics
recognize language — how system identify sounds, words, and structure of human language
comprehend language — how they analyze meaning + context to interpret what is being said
generate language — how models produce text that sounds meaningful
what is natural language processing? (NLP)
what are the types?
def — build applications that let computers use human language
speech recognition — turning sound waves into words
translation — bridging languages instantly
bias detection — identifying harmful or unfair language patterns in datasets or media
accessibility tech — creating captions, screen readers, and text to speech tools
what are the key steps and goal to speech to text
goal — turn audio signals into words
key steps
acoustic analysis — breaking audio into small frames and extracting features
phoneme recognition — identify the basic units of sounds
word recognition — mapping sequences of phonemes to words using language models
what are the key steps and goal to text to speech
goal — convert written text into natural sounding speech
key steps
text analysis — break text into units
prosody modeling — decode how to stress words, pause, and intonate
waveform synthesis — generate the actual audio waveform using algorithms or neural networks
what is the distributional hypothesis?
words that appear in similar contexts tend to have similar meanings
vector representation — capturing meaning mathematically
embeddings — system performs the task of predicting the next word based on the current word
what are n-grams
sequence of n words that can be used to approximate which word should go next in the sequence
bad because they require a lot of data to identify word relationships, diverse and specific data sets
what are n-gram models ?
choose the most likely word to go next in a sequence given data on which words appear next to each other
what are transformers?
neural network architecture proposed by Google
contain an attention mechanisms that makes them good at generating human text
keep track of word meanings in context of the sentence
methods when talking about computational linguistics
topic modeling — tools to create clusters of similar ideas
keyword frequency analysis — track how words used changed across stages
large language models — generate themes based on clusters of similar ideas
what is the lack of linguistics diversity cycle?
what does it lead to?
no or even less data → no models → no tools → speaker switch to dominant languages
linguistic injustice — tech that claims to be universal serves only part of the world
cultural loss — language carry unique knowledge systems, stories, worldviews
power dynamics — reinforces english dominance