1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Understanding a language
grounded in experience and perception, uses meaning, context, and emotion. Learn from the world
Stimulating a language
models patterns it sees, learns from text/user input. Patterns of word cocurrence, grounded in data and probability.
computational linguistics
using computational methods to model how human language works
Natural language processing
builds applications that let computers use human languages
distributional hypothesis
words that appear in similar context tend to have similar meanings. cat and dog appear near words like pet, fur, feed
vector representations
computers that appear in similar context build vector representations. capturing meaning mathematically
vector semantics
words become vectors where dimensions represent co-occurrences with other words. Calculate similarity
embeddings
takes words as input and produce vectors as outputs
static embeddings
learns what vector dimensions are useful. system then predicts the next word based on a current word.
Advantages of static embeddings
vectors have fewer dimensions (~300 vs ~50k). Dense representations, fewer 0's
Disadvantages of static embeddings
Give 1 fixed vector for a word. Bank gets the same vector regardless of whether it's in the context of a river or a financial institution
computing analogies as static embeddings
"Man is to king as woman is to ---"
1. get relationship between "man and king"
2. apply relationship to "woman"
3. find nearest word
contextual embeddings
same words get different vectors depending on context. models look at surrounding words when creating representations. (ELMo, BERT)
Attention mechanisms
use attention to weigh which context words matter. Ie. "the river bank was flooded." Pays attention to words like river and flooded. contextual clues shape representation
N-grams
sequence of n words that approximate which word should go next in the sequence
n-grams example
"The water of Walden Pond is so beautifully..."
2 grams: "water of," "the water"
3 grams: "the water of," "pond is so"
4 grams: "walden pond is so"
infinity - gram blue
2 - gram blue
4 - gram blue
n-grams disadvantage
they require a lot of data to identify word relationships, diverse datasets to generate novel text, specific datasets relevant to use case of the model.
transformers
a neural network architecture proposed by google in 2017. LLMs like chatgpt are possible because of transformers. they have an attention mechanism that makes them good at generating human language. encode context sensitivity, keep track of word meanings in the context of the sequence
homonym
same word different meanings
Leka & shah, 2025
improve how we guide poeple to brainstorm and innovate solutions to challenges in industry
ignoring linguistic diversity
Most research focuses on english. billions of speakers excluded from ai benefits. digital colonialism - language hierarchies reinforced digitally. accelerate loss of endangered languages. technical and economic barriers. leads to linguistic injnustice, cultural loss, reinforces language power dynamics
reproduce implicit bias
encode patterns in language data. black names have greater cosine similarity to unpleasant words than white names. analogies encode stereotypes, father is to doctor as mother is to....
benefits of LLMs
healthcare, accessibility, education, documentation of endangered languages
psycholinguistics
study of how we understand, produce and learn language
speech perception
how we decode acoustic signals into meaningful sounds. speaker normalization, McGurk effect
lexical access
how we retrieve word meanings from mental storage
sentence processing
how we parse through grammatical structure and build meaning
language production
how we plan to articulate our thoughts into speech
speaker normalization
part of speech perception, modify our expectations about linguistic input to account for what we know about the speaker. gender, physical size
mcgurk effect
an error in perception that occurs when we misperceive sounds because the audio and visual parts of the speech are mismatched. we also rely on visual information to percieve sounds
Warren and Warren (1970)
Found that participants reported hearing sentence relevant phoneme restoration. (eel of a shoe, eel of an orange)
temporary ambiguity
Present during the processing of a sentence, resolved by the end of a sentence. ("the rock band (banned?) played all night")
garden path effect
Phenomena in which people are fooled into thinking a sentence has a different structure because of a temporary ambiguity
global ambiguity
not resolved by the end of a sentence, require context to determine intended structure and meaning. ("the cop saw the man with the binoculars") Prosody
prosody
intonation and pausing to help solve ambiguity
speech production
conceptualization, formulation, articulation
speech errors
anticipations, preservations, metathesis, spoonerisms, shifts, blends, substitutions
anticipations
a later unit is substituted/added for an earlier unit
Splicing from one tape/ splacing from one tape
anticipation
preservations
earlier unit substituted for later unit
splicing from one tape/ splicing from one type
preservation
metathesis
switching up 2 units taking the place of the other
fill the pool/ fool the pill
metathesis
spoonerism
metathesis that involves the first sounds of 2 separate words
dear old queen/ queer old dean
spoonerism
shift
unit is moved from one location to another
she decides to hit it/ she decide to hits it
shift
blends
two words fuse
grizzly and ghastly/ grastly
blends
substitutions
1 unit is replaced with another
its hot in here/ its cold in here
substitutions
what speech errors reveal
speech is planned in advance. there are distinct levels of planning (meaning, words, sounds). Slips of the hand occur also with sign language
non literal language
metaphor, idioms, irony/sarcasm
metaphor
understanding one thing in terms of another. "time is money"
idioms
fixed expressions with non compositional meaning "she spilled the beans". familiar idioms are processed as singular lexical units, unfamiliar idioms require more time
irony/sarcasm
saying the opposite of what you mean. "nice parking job". requires recognizing the mismatch between the statement and the context
sequential brain processing
1. brain computes literal meaning
2. detects literal meaning doesn't fit context
3. searches for alternative meaning
(Figurative language takes longer to process)
direct access theory
brain uses context from the start, accesses figurative meaning directly. Familiar metaphors/idioms are processed just as fast as literal language.
lack of invariance in speech perception
the fundamental problem that the same sound (phoneme) is represented by different, inconsistent acoustic signals due to speaker differences, speaking rate, and context (coarticulation), yet listeners consistently perceive the same sound