knowt logo

Chapter 16: Language and Computers

Speech Synthesis

  • Speech synthesis- the use of a machine, usually a computer, to produce human-like speech

  • Canned speech- prerecorded utterances and phrases

  • Synthesized speech- piecing together smaller recorded units of speech into new utterances

  • Intelligibility- how well listeners can recognize and understand the individual sounds or words generated by the synthesis system

  • Naturalness- how much the synthesized speech sounds like the speech of an actual person

  • Articulatory synthesis- a synthesis technique that generates speech “from scratch” based on computational models of the shape of the human vocal tract and the articulation processes

  • Source-filter theory- there are two independent parts to the production of speech sounds

    • Source- the mechanism that creates a basic sound

    • Filter- shapes the sound created by the source into the different sounds we recognize as speech sounds

  • Concatenative Synthesis- uses recorded speech by stringing together pieces of the recorded speech and then smoothing the boundaries between them

    • Unit selection synthesis- takes large samples of speech and builds a database of smaller units from these speech samples

    • Diphone synthesis- pairs of adjacent sounds are attached at the end of one phone and the beginning of another

    • Domain-specific synthesis- create utterances from prerecorded words and phrases that closely match the words and phrases that will be synthesized

  • Text-to-speech synthesis- speech generated directly from text entered with normal orthography

Automatic Speech Recognition

  • Automatic Speech Recognition- the conversion of an acoustic speech waveform into text

  • Noisy channel model- treats speech input as if it has been passed through a communication channel that garbles the speech waveform

  • Components of an Automatic Speech Recognition System:

    • Signal processing- recording the speech waveform with a microphone and storing it in a manner that is suitable for further processing by a computer

    • Acoustic modeling- mapping the energy values extracted during signal processing

    • Pronunciation modeling- used to filter out unlikely sound sequences

    • Language modeling- calculating the probability of sequences

  • Parameters of Speech Recognition Systems

    • Speaking mode- only accepts isolated word input or continuous speech input

    • Vocabulary size- the size of the system’s vocabulary will impact its accuracy

    • Speaker Enrollment- the system may or may not need to be trained to a specific voice

Communicating with Computers

  • Interactive Text-Bases Systems- dialogue carried between computer and user via text

    • Word spotting- a program focuses on words it knows and ignores ones it doesn’t

  • Spoken-Language Dialogue Systems- Dialogue carried between computer and user via speech

    • Isolated speech- the user speaks the input clearly and without extraneous words

    • Continuous speech- input can be more like normal speech

  • Components of a Spoken-language Dialogue System

    • Automatic Speech Recognition- combining levels of linguistic knowledge in order to allow speaker-independent understanding of continuous speech

    • Language Processing and Understanding- the system must decipher not only individual words, but also the intention of the speaker

    • Dialogue Management- the system needs to understand the intentional structure of the conversation

    • Text Generation- the use of computers to respond to humans using natural language by creating sentences that convey the relevant information

    • Speech Synthesis- the words that make up the generated text must be converted into a sequence of sounds

Machine Translation

  • Translation- the task of converting the contexts of a text written in one language into a text in another language

  • Machine translation- the use of computers to carry our translation

  • Problems:

    • Context can often be removed

    • Lexical ambiguity

  • Partial Automation- the source language text can first be pre-edited by a person so as to “prime” it for a machine translation system

Corpus Linguistics

  • Corpus- a collected body of text

  • Corpus linguistics- involves the design and the annotation of corpus materials that are required for specific purposes

  • Corpus can be composed from spoken, signed or written language

  • Corpora can be classified by the genre of the source material

  • Balanced corpora- corpora that try to remain balanced among different genres

  • Reference corpus- specified amount of text that has been collected and annotated

  • Monitor corpus- as new texts continue to be written or spoken, more data is gathered

Chapter 16: Language and Computers

Speech Synthesis

  • Speech synthesis- the use of a machine, usually a computer, to produce human-like speech

  • Canned speech- prerecorded utterances and phrases

  • Synthesized speech- piecing together smaller recorded units of speech into new utterances

  • Intelligibility- how well listeners can recognize and understand the individual sounds or words generated by the synthesis system

  • Naturalness- how much the synthesized speech sounds like the speech of an actual person

  • Articulatory synthesis- a synthesis technique that generates speech “from scratch” based on computational models of the shape of the human vocal tract and the articulation processes

  • Source-filter theory- there are two independent parts to the production of speech sounds

    • Source- the mechanism that creates a basic sound

    • Filter- shapes the sound created by the source into the different sounds we recognize as speech sounds

  • Concatenative Synthesis- uses recorded speech by stringing together pieces of the recorded speech and then smoothing the boundaries between them

    • Unit selection synthesis- takes large samples of speech and builds a database of smaller units from these speech samples

    • Diphone synthesis- pairs of adjacent sounds are attached at the end of one phone and the beginning of another

    • Domain-specific synthesis- create utterances from prerecorded words and phrases that closely match the words and phrases that will be synthesized

  • Text-to-speech synthesis- speech generated directly from text entered with normal orthography

Automatic Speech Recognition

  • Automatic Speech Recognition- the conversion of an acoustic speech waveform into text

  • Noisy channel model- treats speech input as if it has been passed through a communication channel that garbles the speech waveform

  • Components of an Automatic Speech Recognition System:

    • Signal processing- recording the speech waveform with a microphone and storing it in a manner that is suitable for further processing by a computer

    • Acoustic modeling- mapping the energy values extracted during signal processing

    • Pronunciation modeling- used to filter out unlikely sound sequences

    • Language modeling- calculating the probability of sequences

  • Parameters of Speech Recognition Systems

    • Speaking mode- only accepts isolated word input or continuous speech input

    • Vocabulary size- the size of the system’s vocabulary will impact its accuracy

    • Speaker Enrollment- the system may or may not need to be trained to a specific voice

Communicating with Computers

  • Interactive Text-Bases Systems- dialogue carried between computer and user via text

    • Word spotting- a program focuses on words it knows and ignores ones it doesn’t

  • Spoken-Language Dialogue Systems- Dialogue carried between computer and user via speech

    • Isolated speech- the user speaks the input clearly and without extraneous words

    • Continuous speech- input can be more like normal speech

  • Components of a Spoken-language Dialogue System

    • Automatic Speech Recognition- combining levels of linguistic knowledge in order to allow speaker-independent understanding of continuous speech

    • Language Processing and Understanding- the system must decipher not only individual words, but also the intention of the speaker

    • Dialogue Management- the system needs to understand the intentional structure of the conversation

    • Text Generation- the use of computers to respond to humans using natural language by creating sentences that convey the relevant information

    • Speech Synthesis- the words that make up the generated text must be converted into a sequence of sounds

Machine Translation

  • Translation- the task of converting the contexts of a text written in one language into a text in another language

  • Machine translation- the use of computers to carry our translation

  • Problems:

    • Context can often be removed

    • Lexical ambiguity

  • Partial Automation- the source language text can first be pre-edited by a person so as to “prime” it for a machine translation system

Corpus Linguistics

  • Corpus- a collected body of text

  • Corpus linguistics- involves the design and the annotation of corpus materials that are required for specific purposes

  • Corpus can be composed from spoken, signed or written language

  • Corpora can be classified by the genre of the source material

  • Balanced corpora- corpora that try to remain balanced among different genres

  • Reference corpus- specified amount of text that has been collected and annotated

  • Monitor corpus- as new texts continue to be written or spoken, more data is gathered

robot