Quest 2 - Language and Computers - Final Exam

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/215

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

216 Terms

1
New cards

Q: Abjads do not usually represent vowels, only consonants

A: True

2
New cards

Q: How many bits does ASCII use to store English text?

A: 7 bits

3
New cards

Q: Automatic spelling checkers run on a whole document, find errors, and make corretions

A: True

4
New cards

Q: When TurnItIn flags a submitted homework as possible plagiarism, what type of machine learning did it probably use to do this?

A: Supervised machine learning (classification)

5
New cards

Q: Chatbots mimic informal human chatting while dialogue agents act as personal assistants to complete some task(s)

A: True

6
New cards

Q: ____ is when a speaker alternates between two or more languages, or language varieties, in the context of a single conversation or situation

A: Codeswitching

7
New cards

Q: Late assignments will be accepted with a medical or official accommodation letter

A: False

8
New cards

Q: Which of the following is NOT a cause for spelling errors?

A: Technological

9
New cards

Q: Machine learning is the main "engine" of modern artificial intelligence

A: True

10
New cards

Q: The earliest chatbot, ELIZA, was designed to mimic a paranoid schizophrenic

A: False

11
New cards

Q: A Wikipedia page is an example of structured data

A: False

12
New cards

Q: The phones that speakers hear are always pronounced exactly the same by every speaker in every word

A: False

13
New cards

Q: Machine learning involves a little bit of probability and statistics but relies a lot on world knowledge

A: False

14
New cards

Q: Consider the following example of learner language: they are very kind and friendship. Based on its distribution (word/linear order), what is the part-of-speech (POS) of friendship?

A: Adjective

15
New cards

Q: A grammatical constituent is a group of words that act as one unit in a sentence and can be substituted in a sentence by a constituent of the same category without violating the language's grammatical rules

A: True

16
New cards

Q: What is an example of structured data?

A: Excel spreadsheets

17
New cards

Q: Unicode is an improvement over ASCII because Unicode can encode all characters of all writing systems

A: True

18
New cards

Q: Match the type of strategy of doing feature engineering with its definition

a. Kitchen sink strategy

b. Hand-crafted strategy

A: (a) Use lots of features in the hope that some will be relevant and useful

(b) Use careful thought to try to identify, ahead of time, a small set of features that are likely to be relevant

19
New cards

Q: In multilingual NLP, the "vocabulary challenge" refers to the fact that some speakers use fewer words than speakers of other languages

A: False

20
New cards

Q: Consider the following example of learner language: they are very kind and friendship. Based on morphological evidence, what is the POS of friendship (hint: consider the POS of English words that are morphologically similar)?

A: Noun

21
New cards

Q: The basic principle of encoding writing in binary (such as with Unicode or ASCII) is that every ___ is encoded as a specific ___

A: Character, numeric value

22
New cards

Q: To correct the spelling of a word. Select the ones that apply

A: Rank candidates, detect an error

23
New cards

Q: Which one of these sentences is more likely to be generated by a Unigram language model?

A: Hill he late speaks; or! A more to leg less first you enter

24
New cards

Q: Which of these statements is a performative utterance (speech act)?

A: I forgive you

25
New cards

Q: What are examples of stop words?

A: a, the, in, but

26
New cards

Q: Which citation style is required for the final writing requirement in this course?

A: APA

27
New cards

Q: A full representation of Unicode (UTF-32) takes __ bytes to encode one character

A: 4

28
New cards

Q: Hoow mnay isolated ssspeliing errors are in this questions?

A: 3

29
New cards

Q: Match the term with its characteristic

a. Supervised learning

b. Unsupervised learning

A: (a) The training data and the test data have been labeled with the desired "correct answers"

(b) There are no prespecified categories in the training data and the test data

30
New cards

Q: You notice over the course of a month that your spam filter correcrly flagged 1300 spam messages, but it missed 25, and it incorrectly said that there were 100 spam messages (which were actually ham). This was out of a total of 2100 messeges that you received. What is the precision of the spam filter?

A: 1300/1400

31
New cards

Q: Match the technique to its application in spellchecking.

a. Dictionary look-up

b. N-gram analysis (character-level)

c. Transition and confusion probabilities

A: (a) error detection

(b) error detection

(c) ranking correction candidates

32
New cards

Q: Sentiment analysis uses supervised machine learning to detect the attitude of a writer

A: True

33
New cards

Q: Speech is a kind of _______ that can be accomplished directly or indirectly.

A: Action

34
New cards

Q: In the translation triangle, the bottom corners are:

A: the source and target languages

35
New cards

Q: What are the two basic types of writing systems?

A: Meaning-based (logographic), Sound-based (letters)

36
New cards

Q: In natural language processing, the task of identifying dates, addresses, or names of people or companies in a text is referred to as __________.

A: Named entity recognition

37
New cards

Q: The first step used in probabilistic methods is:

A: count misspellings in the text

38
New cards

Q: "Erde" in German and "Earth" in English are examples of cognates.

A: True

39
New cards

40
New cards

Q: Which of the following is NOT a type of the ASR systems?

A: Language dependent

41
New cards

Q: What does the following equation represent? P(B|A) = P(A and B) / P(A)

A: Conditional probability

42
New cards

Q: Large Language Models (LLMs) like ChatGPT are corpus-based dialogue systems that require massive amounts of training to work successfully.

A: True

43
New cards

Q: The final essay for the class is graded as:

A: Satisfactory/Unsatisfactory/Fail

44
New cards

Q: Which pair of words are homophones?

A: kernel/colonel

45
New cards

Q: The basic set-up for machine learning is to train the statistical parameters of the model on some data (i.e., training set) and then use this trained model to compute probabilities of some new data (i.e., test set). This new data set could not be exactly the same data we trained on.

A: True

46
New cards

Q: Search engines rank search results based on how many other pages link to the pages in the results.

A: True

47
New cards

Q: The format of the final version of the essay should be:

A: PDF

48
New cards

Q: Match the technique to its application in spellchecking.

a. Similarity key

b. Minimum edit distance

A: (a) candidate generation

(b) ranking correction candidates

49
New cards

Q: What is grounding in a conversation?

A: The hearer/listener acknowledges they understand

50
New cards

Q: A search engine does not actually search the whole Internet.

A: True

51
New cards

Q: What do ASR and TTS do?

A: ASR maps sound to text / TTS maps text to sound

52
New cards

Q: In binary, the positions in an eight-digit number encode:

A: 128s, 64s, 32s, 16s, 8s, 4s, 2s, 1s

53
New cards

Q: Nothing is due on the writing requirement until the last day of class

A: False

54
New cards

Q: What type of error occurs in the following sentence?

They are leaving in about fifteen minuets to go to her house

A: Semantic error

55
New cards

Q: What does the "low-resource challenge" refer to?

A: Many languages have limited digital written data available

56
New cards

Q: What is the definition of intonation?

A: The rise and fall in a speaker's pitch (frequency)

57
New cards

Q: Which are THREE articulatory features that we use to describe consonants?

A: Place of articulation, voicing, manner of articulation

58
New cards

Q: Which is NOT one of the string edit operations that are important to count when building a spellchecker?

A: Redundancy

59
New cards

Q: Which type of analysis or model is the basis for probabilistic grammar checkers, large language models (LLM) such as ChatGPT, and predictive text applications?

A: n-grams

60
New cards

Q: How many possible characters does ASCII have?

A: 128

61
New cards

Q: Math the term with its best example

a. Inflectional affix

b. Derivational affix

A: (a) walk - walks

(b) catch - catcher

62
New cards

Q: The simplest policy of classifying documents is to pretend that we are dealing with a completely unstructured collection of words (bag-of-words assumption), because it is realistic about how language is used

A: False

63
New cards

Q: In general, ASR systems go through these steps:

A: Information loss; acoustic signal processing; the recognition of sounds, group of sounds, and words

64
New cards

Q: Which of the following are properties of human conversation?

A: Turns, utterances, grounding

65
New cards

A chatbot is different from a voice assistant because a voice assistant...

A language that is no longer being passed to younger generations

66
New cards

Match the Gricean conversational maxim to its definition:

quality

quantity

relevance/relation

manner

be truthful, be exactly information as required, be relevant to the purpose and direction of the exchange, be clear

67
New cards

What are two advantages searching unstructured data has over searching structured data?

Possibility to search any page that is indexed on the web, does not require as much human effort to prepare page

68
New cards

You notice over the course of a month that your spam filter correctly flagged 1300 spam messages, but it missed 25. Also, it incorrectly said that there were 100 spam messages (which were actually ham). This was out of a total of 2100 messages that you received. What is the recall of the spam filter?

1300/1325

69
New cards

The four Gricean maxims, or rules, of conversation are based on what principle?

The cooperative principle: the speakers are trying to cooperate in the conversation

70
New cards

Typical applications of document classification include all of the following except:

authorship attribution

grammar checking

sentiment analysis

spam filtering

Grammar checking

71
New cards

A chatbox is different from a voice assistant because a voice assistant...

is usually a limited-domain dialogue system that is not designed to accomplish a particular task

72
New cards

We encounter an English learner who says "France in" and "the part at", e.g., "I saw him the party at." In other words, they are treating this as a postposition (found in other languages), instead of a preposition. What is the correct phrase structure that would catch this grammar error?

PP -> PNP

73
New cards

When you have four pencils but you answer "Two" when someone asks "How many pencils do you have?" this is an example of an uncooperative violation of the Maxim of Quantity or Quality.

True

74
New cards

Sentiment analysis is a kind of supervised machine learning (classification) that can automatically insert positive and negative sentiments into written texts.

False

75
New cards

One of the biggest challenges facing modern AI language technology is that most languages do not have sufficient data available to train state-of-the-art algorithms.

True

76
New cards

A corpus is an example of unstructured or semi-structured data.

True

77
New cards

In order for a computer to do anything with a language, it needs a way of representing language.

True

78
New cards

Interlingua

a language independent representation of a meaning in MT

79
New cards

Deskilling

An effect of introducing new technology: certain jobs can be carried out by less educated employees, earning lower wages

80
New cards

Stop words

Words which are ignored in searches

81
New cards

Parsing

Determining or annotating the structure of a sentence

82
New cards

What are two ways that a computer must be able to represent natural human languages?

audio, text

83
New cards

Which of these speech acts is direct?

can you wash the dishes?

wash the dishes.

please wash the dishes.

I'd like if you could wash the dishes

Wash the dishes, please wash the dishes

84
New cards

Which elements would an interlingua system ignore to represent the following sentence for both Spanish-to-English and English-to-Spanish translation? Did you see the white cow? Viste una vaca blanca?

Language specific structure

85
New cards

Which is a direct speech act?

It seems like it'd be cooler if the windows were open

could you open the window?

open the window

I like the window open

open the window

86
New cards

What are examples of language usage that go beyond the functional use of conveying information and voice requests?

expressing individual and social identity

87
New cards

The latin word hospes can be translated into English as host, guest, friend, or stranger. Describe the relationship between the latin word and the four english words in terms of hypernymy and hyponymy

Hospes can be considered a hypernym of host, guest, friend, or stranger; host, guest, friend, or stranger can be considered a hyponym of hospes

88
New cards

The earliest example of a dialogue system was called ELIZA

true

89
New cards

One of the earliest applications of AI was computer-assisted language learning

False

90
New cards

A bilingual speaker is defined as someone equally conversant in all domains in two languages

False

91
New cards

Modern state-of-the-art machine learning algorithms are called "neural networks" because they are an exact replica of human brain neuron pathways

False

92
New cards

Because the cooperative principle holds true for all conversations, its sub-principles known as the Gricean conversational maxims are never violated

false

93
New cards

Microtransistors

Basic electrical component of computers that store data in either 1s or 0s (binary), with each transistor being 1 bit of information

94
New cards

Bit

single unit of computer information

95
New cards

Byte

unit of information with 8 bits

96
New cards

2 main types of writing systems

Logographic (meaning) and letters (sound)

97
New cards

What are the types of logographic writing systems?

Pictographs (pictures), ideographs (ideas), semantic-phonetic (symbols)

98
New cards

what are the main types of letter writing systems?

alphabetic and syllabic

99
New cards

what are the types of alphabetic writing systems

alphabet (phonemic alphabets, representing all sounds)and abjad (consonant alphabets)

100
New cards

what are the types of syllabic writing systems

abugida (symbols represent a consonant+vowel, but vowel or consonant can be changed by changing the symbol or adding diacritics) and syllabary (separate symbol for each syllable of a language)