1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Natural Language Processing
use of machines or computers to process human language
Computational Linguistics
overlaps NLP, more academic, it includes for example automatic tools for translation or summarization
Speech Synthesis
the creation of artificial speech from a written text in natural language
Traditional Speech Synthesis
basis of words that are recorded - usually 20k - you extract diphones and connect them into new words
Diphones
a transition between two sounds or a sound and silence. They are extracted from specific contexts.
Voice talent
the basis of words recorder to be used in speech synthesis
Markup language
something that we add to a normal text in order to add some additional information (for example for a speech synthesizer or for documents to be correctly displayed)
SSML
speech synthesis markup language
HTML
Hypertext Markup Language - a standard markup language for documents designed to be displayed in a web browser
ASR
Automatic Speech Recognition
Role of n-grams in ASR
n-grams allow machines to predict the probability of a sequence of words occurring together. For example, in a trigram model, a machine would predict the next word based on the two previous words. This can be used for auto-correction in speech recognition.
MT
Machine Translation - Translating text or speech to different languages. There are customizable machine translation systems, which are adapted to a specific domain and might be trained to understand the terminology associated with a particular field.
WSD
Word Sense Disambiguation - techniques for a machine to understand polysemous words. Possible approaches: knowledge-based (dictionary-based) or supervised approach. The supervised approach is based on NLP algorithms that learn from training data. (example I’m leaving you – who is she? – it might be hard for a machine to understand the relation)
NER
Named Entity Recognition - a system that analyses a text and looks for named entities: places, names, dates. It is to a large extent context-dependent so it’s not easy.
NLG
Natural Language Generation - subfield of NLP designed to build computer systems or applications that can automatically produce all kinds of texts in natural language by using semantic representation as input. This might include question answering and text summarization. An example of such system is ChatGPT by OpenAI.
X-SAMPA
alphabet alternative to IPA that expresses the same information but with ASCII (all symbols that are on a keyboard)
ASCII
American Standard Code for Information Exchange - character encoding standard for electronic communication.
N-Grams
N-grams - Unigrams, bigrams, trigrams, 4-grams, 5-grams – it is a sequence of a given amount of words ? THEY OVERLAP!!
Markov Chain
it predicts the next step on the basis of preceding local environment - used sometimes in NLP
Tokenization
breaking up a string of words into semantically useful units calles tokens
POS-tagging
Part-of-Speech tagging - adding a part of speech category to each token within a text. Common PoS tags are: verb, adj, noun, pronoun, conjunction, preposition, intersection
Lemmatization
transforming words in a sentence into their base forms based on lemmas (root). It is dictionary based.
Stemming
transforming words in a sentence into their baase forms based on stems (trimming a word). It is not dictionary based, it operates on single words. a word “better” would be transformed into “good” by lemmatizer but into “better” by stemming.
Chunking
a process of extracting phrases from unstructured text, which means analyzing a sentence to identify the constituents (Noun Groups, verb groups etc)
Stopword removal
an essential step in NLP text processing. It involves filtering out high-frequency words that add little or no semantic valuse to a sentence, for example, which, to, at, for, is, etc.
Tagging/classification
adding a tag/ category to each token within a text/to a text based on its content
Sentiment Analysis
classifying text by the polarity of opinion
Synchronic linguistics
focusing on one stage of development
Diachronic lignuistics
historical, looking at different stages
What is programming
giving commands to a machine in order to get it to complete some kind of task - the result of a code might be an application, game, anything
What is Python
a programming language
Variables in Python
it is something that the user defines, for example height
Basic data types in Python
str - a string of symbols, int - integer (number), float - a number with a decimal
synonymy vs antonymy
synonymy refers to the relationship between words that have similar or identical meanings – antonymy is the relationship between words that have opposite meanings
hyperonymy vs hyponymy
This is a semantic relation in which one term (the hypernym) is more general than another term (the hyponym) and includes it in its meaning. For example, “flower” is a hypernym of “rose” because a rose is a type of flower and “sparrow” is a hyponym of “bird”.
holonymy vs meronymy
both of these refer to the semantic relationship between a part and a whole (holonym is the whole, meronym is the part). For example, “tree” is a holonym of “leaf” (because a leaf is a part of the tree) and “wheel” is a meronym of a car.
troponymy
the relationship between two verbs where one verb is a specific “manner” of another – for example, “to whisper” is a troponym of “to speak”
Corpus
a collection of texts used to extract specific information about how language is used, in which contexts etc.
WordNet
online lexical database in which different parts of speech (nouns, verbs, adjectives and adverbs) are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
AntConc
a programme allowing the user to create their own corpus out of any text