Chapter 16: Language and Computers
Speech synthesis- the use of a machine, usually a computer, to produce human-like speech
Canned speech- prerecorded utterances and phrases
Synthesized speech- piecing together smaller recorded units of speech into new utterances
Intelligibility- how well listeners can recognize and understand the individual sounds or words generated by the synthesis system
Naturalness- how much the synthesized speech sounds like the speech of an actual person
Articulatory synthesis- a synthesis technique that generates speech “from scratch” based on computational models of the shape of the human vocal tract and the articulation processes
Source-filter theory- there are two independent parts to the production of speech sounds
Source- the mechanism that creates a basic sound
Filter- shapes the sound created by the source into the different sounds we recognize as speech sounds
Concatenative Synthesis- uses recorded speech by stringing together pieces of the recorded speech and then smoothing the boundaries between them
Unit selection synthesis- takes large samples of speech and builds a database of smaller units from these speech samples
Diphone synthesis- pairs of adjacent sounds are attached at the end of one phone and the beginning of another
Domain-specific synthesis- create utterances from prerecorded words and phrases that closely match the words and phrases that will be synthesized
Text-to-speech synthesis- speech generated directly from text entered with normal orthography
Automatic Speech Recognition- the conversion of an acoustic speech waveform into text
Noisy channel model- treats speech input as if it has been passed through a communication channel that garbles the speech waveform
Components of an Automatic Speech Recognition System:
Signal processing- recording the speech waveform with a microphone and storing it in a manner that is suitable for further processing by a computer
Acoustic modeling- mapping the energy values extracted during signal processing
Pronunciation modeling- used to filter out unlikely sound sequences
Language modeling- calculating the probability of sequences
Parameters of Speech Recognition Systems
Speaking mode- only accepts isolated word input or continuous speech input
Vocabulary size- the size of the system’s vocabulary will impact its accuracy
Speaker Enrollment- the system may or may not need to be trained to a specific voice
Interactive Text-Bases Systems- dialogue carried between computer and user via text
Word spotting- a program focuses on words it knows and ignores ones it doesn’t
Spoken-Language Dialogue Systems- Dialogue carried between computer and user via speech
Isolated speech- the user speaks the input clearly and without extraneous words
Continuous speech- input can be more like normal speech
Components of a Spoken-language Dialogue System
Automatic Speech Recognition- combining levels of linguistic knowledge in order to allow speaker-independent understanding of continuous speech
Language Processing and Understanding- the system must decipher not only individual words, but also the intention of the speaker
Dialogue Management- the system needs to understand the intentional structure of the conversation
Text Generation- the use of computers to respond to humans using natural language by creating sentences that convey the relevant information
Speech Synthesis- the words that make up the generated text must be converted into a sequence of sounds
Translation- the task of converting the contexts of a text written in one language into a text in another language
Machine translation- the use of computers to carry our translation
Problems:
Context can often be removed
Lexical ambiguity
Partial Automation- the source language text can first be pre-edited by a person so as to “prime” it for a machine translation system
Corpus- a collected body of text
Corpus linguistics- involves the design and the annotation of corpus materials that are required for specific purposes
Corpus can be composed from spoken, signed or written language
Corpora can be classified by the genre of the source material
Balanced corpora- corpora that try to remain balanced among different genres
Reference corpus- specified amount of text that has been collected and annotated
Monitor corpus- as new texts continue to be written or spoken, more data is gathered
Speech synthesis- the use of a machine, usually a computer, to produce human-like speech
Canned speech- prerecorded utterances and phrases
Synthesized speech- piecing together smaller recorded units of speech into new utterances
Intelligibility- how well listeners can recognize and understand the individual sounds or words generated by the synthesis system
Naturalness- how much the synthesized speech sounds like the speech of an actual person
Articulatory synthesis- a synthesis technique that generates speech “from scratch” based on computational models of the shape of the human vocal tract and the articulation processes
Source-filter theory- there are two independent parts to the production of speech sounds
Source- the mechanism that creates a basic sound
Filter- shapes the sound created by the source into the different sounds we recognize as speech sounds
Concatenative Synthesis- uses recorded speech by stringing together pieces of the recorded speech and then smoothing the boundaries between them
Unit selection synthesis- takes large samples of speech and builds a database of smaller units from these speech samples
Diphone synthesis- pairs of adjacent sounds are attached at the end of one phone and the beginning of another
Domain-specific synthesis- create utterances from prerecorded words and phrases that closely match the words and phrases that will be synthesized
Text-to-speech synthesis- speech generated directly from text entered with normal orthography
Automatic Speech Recognition- the conversion of an acoustic speech waveform into text
Noisy channel model- treats speech input as if it has been passed through a communication channel that garbles the speech waveform
Components of an Automatic Speech Recognition System:
Signal processing- recording the speech waveform with a microphone and storing it in a manner that is suitable for further processing by a computer
Acoustic modeling- mapping the energy values extracted during signal processing
Pronunciation modeling- used to filter out unlikely sound sequences
Language modeling- calculating the probability of sequences
Parameters of Speech Recognition Systems
Speaking mode- only accepts isolated word input or continuous speech input
Vocabulary size- the size of the system’s vocabulary will impact its accuracy
Speaker Enrollment- the system may or may not need to be trained to a specific voice
Interactive Text-Bases Systems- dialogue carried between computer and user via text
Word spotting- a program focuses on words it knows and ignores ones it doesn’t
Spoken-Language Dialogue Systems- Dialogue carried between computer and user via speech
Isolated speech- the user speaks the input clearly and without extraneous words
Continuous speech- input can be more like normal speech
Components of a Spoken-language Dialogue System
Automatic Speech Recognition- combining levels of linguistic knowledge in order to allow speaker-independent understanding of continuous speech
Language Processing and Understanding- the system must decipher not only individual words, but also the intention of the speaker
Dialogue Management- the system needs to understand the intentional structure of the conversation
Text Generation- the use of computers to respond to humans using natural language by creating sentences that convey the relevant information
Speech Synthesis- the words that make up the generated text must be converted into a sequence of sounds
Translation- the task of converting the contexts of a text written in one language into a text in another language
Machine translation- the use of computers to carry our translation
Problems:
Context can often be removed
Lexical ambiguity
Partial Automation- the source language text can first be pre-edited by a person so as to “prime” it for a machine translation system
Corpus- a collected body of text
Corpus linguistics- involves the design and the annotation of corpus materials that are required for specific purposes
Corpus can be composed from spoken, signed or written language
Corpora can be classified by the genre of the source material
Balanced corpora- corpora that try to remain balanced among different genres
Reference corpus- specified amount of text that has been collected and annotated
Monitor corpus- as new texts continue to be written or spoken, more data is gathered