Language and Computers final exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

Natural Language Processing

use of machines or computers to process human language

2
New cards

Computational Linguistics

overlaps NLP, more academic, it includes for example automatic tools for translation or summarization

3
New cards

Speech Synthesis

the creation of artificial speech from a written text in natural language

4
New cards

Traditional Speech Synthesis

basis of words that are recorded - usually 20k - you extract diphones and connect them into new words

5
New cards

Diphones

a transition between two sounds or a sound and silence. They are extracted from specific contexts.

6
New cards

Voice talent

the basis of words recorder to be used in speech synthesis

7
New cards

Markup language

something that we add to a normal text in order to add some additional information (for example for a speech synthesizer or for documents to be correctly displayed)

8
New cards

SSML

speech synthesis markup language

9
New cards

HTML

Hypertext Markup Language - a standard markup language for documents designed to be displayed in a web browser

10
New cards

ASR

Automatic Speech Recognition

11
New cards

Role of n-grams in ASR

n-grams allow machines to predict the probability of a sequence of words occurring together. For example, in a trigram model, a machine would predict the next word based on the two previous words. This can be used for auto-correction in speech recognition.

12
New cards

MT

Machine Translation - Translating text or speech to different languages. There are customizable machine translation systems, which are adapted to a specific domain and might be trained to understand the terminology associated with a particular field.

13
New cards

WSD

Word Sense Disambiguation - techniques for a machine to understand polysemous words. Possible approaches: knowledge-based (dictionary-based) or supervised approach. The supervised approach is based on NLP algorithms that learn from training data.  (example I’m leaving you – who is she? – it might be hard for a machine to understand the relation)

14
New cards

NER

Named Entity Recognition - a system that analyses a text and looks for named entities: places, names, dates. It is to a large extent context-dependent so it’s not easy.

15
New cards

NLG

Natural Language Generation - subfield of NLP designed to build computer systems or applications that can automatically produce all kinds of texts in natural language by using semantic representation as input. This might include question answering and text summarization. An example of such system is ChatGPT by OpenAI.

16
New cards

X-SAMPA

alphabet alternative to IPA that expresses the same information but with ASCII (all symbols that are on a keyboard)

17
New cards

ASCII

American Standard Code for Information Exchange - character encoding standard for electronic communication.

18
New cards

N-Grams

N-grams - Unigrams, bigrams, trigrams, 4-grams, 5-grams – it is a sequence of a given amount of words ? THEY OVERLAP!!

19
New cards

Markov Chain

it predicts the next step on the basis of preceding local environment - used sometimes in NLP

20
New cards

Tokenization

breaking up a string of words into semantically useful units calles tokens

21
New cards

POS-tagging

Part-of-Speech tagging - adding a part of speech category to each token within a text. Common PoS tags are: verb, adj, noun, pronoun, conjunction, preposition, intersection

22
New cards

Lemmatization

transforming words in a sentence into their base forms based on lemmas (root). It is dictionary based.

23
New cards

Stemming

transforming words in a sentence into their baase forms based on stems (trimming a word). It is not dictionary based, it operates on single words. a word “better” would be transformed into “good” by lemmatizer but into “better” by stemming.

24
New cards

Chunking

a process of extracting phrases from unstructured text, which means analyzing a sentence to identify the constituents (Noun Groups, verb groups etc)

25
New cards

Stopword removal

an essential step in NLP text processing. It involves filtering out high-frequency words that add little or no semantic valuse to a sentence, for example, which, to, at, for, is, etc.

26
New cards

Tagging/classification

adding a tag/ category to each token within a text/to a text based on its content

27
New cards

Sentiment Analysis

classifying text by the polarity of opinion

28
New cards

Synchronic linguistics

focusing on one stage of development

29
New cards

Diachronic lignuistics

historical, looking at different stages

30
New cards

What is programming

giving commands to a machine in order to get it to complete some kind of task - the result of a code might be an application, game, anything

31
New cards

What is Python

a programming language

32
New cards

Variables in Python

it is something that the user defines, for example height

33
New cards

Basic data types in Python

str - a string of symbols, int - integer (number), float - a number with a decimal

34
New cards

synonymy vs antonymy

synonymy refers to the relationship between words that have similar or identical meanings – antonymy is the relationship between words that have opposite meanings

35
New cards

hyperonymy vs hyponymy

This is a semantic relation in which one term (the hypernym) is more general than another term (the hyponym) and includes it in its meaning. For example, “flower” is a hypernym of “rose” because a rose is a type of flower and “sparrow” is a hyponym of “bird”.

36
New cards

holonymy vs meronymy

both of these refer to the semantic relationship between a part and a whole (holonym is the whole, meronym is the part). For example, “tree” is a holonym of “leaf” (because a leaf is a part of the tree) and “wheel” is a meronym of a car.

37
New cards

troponymy

the relationship between two verbs where one verb is a specific “manner” of another – for example, “to whisper” is a troponym of “to speak”

38
New cards

Corpus

a collection of texts used to extract specific information about how language is used, in which contexts etc.

39
New cards

WordNet

online lexical database in which different parts of speech (nouns, verbs, adjectives and adverbs) are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

40
New cards

AntConc

a programme allowing the user to create their own corpus out of any text