Stimulus for hearing
Movement of air molecules. The sound source vibrates and those vibrations are carried by waves of air molecules. We hear from 20Hz to 20000 Hz
A simple sine wave is a pure, smooth wave that represents the most basic kind of stimulus. Only vibrating at one frequency
Relationship between physical properties of sound and their related perceptual qualities
Physical:
1. Frequency: refers to the # of cycles per second. Measured in terms of Hertz (Hz).
2. Amplitude: Refers to the height of the waveform. Measured in decibels (db) More energy = more decibels
3. Complexity: Refers to the number of pure tones combined together
Psychological
1. Pitch/Tone: E.g., the notes in music
2. Loudness: lecture voice about 80db
3. Timbre: The quality of sound that allows one to distinguish it from another sound
Other factors that affect our perception of Loudness, Pitch, and Timbre
Loudness - determined primarily by Amplitude
Also affected by Frequency
Equal loudness curves - most sensitive to middle range frequencies (500 Hz to 5000Hz) They sound equally loud to our ears.
B. Pitch - determined primarily by Frequency
Also affected by Complexity
Increases in complexity increase perceived pitch
2. In a complex waveform, the pitch perceived is that associated with the fundamental frequency
(Slowest regularly oscillating wave)
C. Timber - depends primarily on the Complexity of the waveform
Also affected by the “attack” and “decay” portions of the waveform
Attack refers to the speed of onset (e.g., slow versus rapid attack)
Decay refers to the speed of offset (e.g., slow versus rapid decay)
Example “Bowed” versus “plucked in a violin
Interaural Time Difference (ITD)
The brain uses differences in time in which a sound reaches both ears to determine the location of the sound.
ITD is a useful cue for frequencies below ~ 1300 Hz (e.g., a low frequency sound source)
B. Interaural Intensity Difference (IID)
IID is a useful cue for frequencies above ~ 1300 Hz may cause sound shadow.
C. Cone of Confusion - refers to points in space such that sound sources produce identical interaural time and intensity differences
An octave is a way of describing doubling or halving a frequency — it's a very natural "jump" in sound or light. An octave is when you go from one note to the same note but higher or lower — not louder, but higher in pitch.
A harmonic is a sound that happens when you have extra vibrations stacked on top of the main vibration When you pluck a guitar string, you don't just hear one pure note. You hear the fundamental note plus a bunch of harmonics.The fading sounds are the harmonics.
Place Theory says: Different places along your inner ear (specifically, the basilar membrane in the cochlea) respond to different pitches (frequencies). High-pitched sounds (like a whistle) vibrate the beginning (base) of the cochlea. Low-pitched sounds (like a bass drum) vibrate the end (tip) of the cochlea.
Temporal Theory (also called Timing Theory) says: We hear pitch based on how fast the neurons in our ear fire (the timing of their firing matches the sound's frequency). A sound at 100 Hz = neurons fire about 100 times per second. Phase locking means that neurons fire at the same point (phase) of every sound wave cycle — they "lock on" to the rhythm of the wave. Volley Principle says that groups of neurons take turns (like a team!) to fire together and keep up with higher frequency sounds.
Fusion Effect:
When two sounds are very close together in time (milliseconds apart), your brain combines them into one single sound.
Precedence Effect (also called the "Law of the First Wavefront"):
When two identical sounds come from different places, but with a tiny delay, your brain hears only the first one and ignores the later one to figure out where the sound came from.
Echolocation in the Blind:
Some blind individuals learn to use echoes (reflected sound waves) to "see" their environment — just like bats and dolphins do! They make clicking sounds with their tongue, tapping canes, snapping fingers, or even listening to footsteps.
They then analyze the returning echoes to figure out:
How far away an object is
What shape it is
Whether it’s soft, hard, wide, narrow, etc.
Architectural acoustics is the science of designing buildings (like concert halls, classrooms, churches, theaters) to control how sound behaves inside them.
It’s about making sure that:
Sounds are clear (not muddy or echoey)
Sounds are loud enough (but not too loud)
Sounds reach everyone in the room evenly
Background noise is minimized
The Three Main Structures of the Ear
Outer Ear (collects sounds waves and funnels them to the ear drum)
pinna (fleshy ‘articulated’ outer portion)
auditory canal (funnels sound to ear drum)
B. Middle Ear (magnifies vibrations from the eardrum to the cochlea)
tympanic membrane (ear drum)
ossicles – three bones (increases PSI to inner ear)
a. malleus (hammer)
b. incus (anvil)
c. stapes (stirrup)
C. Inner Ear (converts mechanical energy to chemoelectric signals to the brain: sound!)
Cochlea (snail-shaped pea-size)
a. Three fluid-filled chambers
Scala vestibule (upper chamber)
Scala media (middle chamber, also called the ‘cochlear duct’; contains Organ of Corti)
Scala tympani (bottom chamber)
b. Oval window – flexible structure on upper chamber of cochlea; stapes rests on this structure; transmits pressure wave to upper and middle chambers
c. Round window – flexible structure on bottom chamber of cochlea; pressure release
d. Helicotrema – opening at the apex of the cochlea that allows vibrations from the upper chamber to be funneled back down to the lower chamber.
D. Scala media (middle chamber; also called the cochlear duct
1. Reissner’s membrane (separates Scala vestibuli from Scala media)
2. Basilar membrane (separates Scala media from Scala tympani; vibrations transmitted to hair cells)
3. Contains Organ of Corti
Inner hair cells (frequency perception)
Outer hair cells (loudness perception)
Tectorial membrane (outer hair cells embedded)
A. Timing/Frequency Theory – pitch coded with respect to the rate of neural firing.
1. The rate at which neurons fire depends upon rate of hair/cilia bending, which in turns depends upon frequency.
For instance, a 500 Hz tone causes hair/cilia to bend 500 times per second, which causes neurons to fire 500 times per second
2. Problem: this places the upper limit of pitch perception to ~ 1000 Hz (fastest rate a neuron can fire)
3. Solution: Brain can code for frequencies above 1000 Hz by virtue of the Volley Principle; neurons fire sequentially
i. Several neurons firing sequentially can code for frequencies above 1000 Hz as long as the neurons are phased locked.
ii. as long as neurons fire at the same point in the wave form they are phased locked.
4. Next Problem: the upper limit of phase locking is ~ 5000 Hz!
So, how does the brain code for frequencies above 5000 Hz?
B. Place Theory – pitch coded with respect to the place on the basilar membrane that vibrates most vigorously in response to a particular frequency
C. Timing/Frequency and Place Information also work together: Note that the Timing/Frequency and Place mechanisms overlap between 1000Hz and 5000Hz. This happens to be the frequency range associated with human speech.
The capacity to hear separate words is an example of segmentation and parsing.
Segmentation and parsing are interpositional words.
Segmentation/Parsing:
Refers to our ability to hear separate words in an otherwise continuous vocal stream; there are no physical spaces between words yet we hear them as separate.
Fundamental Units of Speech
Phonemes: fundamental unit of speech such that changing a single phoneme in a word changes the meaning of that word. The upper three rows represent phonemes associated with vowels; the bottom three rows are phonemes associated with consonants.
Vowels: created by manipulating a relatively open vocal tract.
Consonants: created by manipulating a relatively closed vocal tract
Place of Articulation: refers to where in the vocal tract the obstruction of the air stream occurs (e.g., “p” in put = front of mouth; “c” in couch = back of mouth)
Manner of Articulation: refers to how the air stream is obstructed (e.g., “p”, “b”, “f” all obstruct the air stream in the front of the mouth)
Voicing/Voicing Onset Time: refers to the timing and degree to which the vocal chords vibrate (e.g., “da” vibrates earlier than “ta”)
Evidence that speech is special
Infant babbling includes phonemes from all languages; universal stages of babbling
Invariant characteristics of the speech signal (e.g., Formants and Formant transitions for “shoo cat” are the same irrespective of the speaker)
Formants: the narrow bands of sound frequency energy associated with vowels. Depends on the fundamental frequency
Formant transition: the transition from a broad energy spectrum to the narrow Formant spectrum. The transition per se appears to code for consonants.
3. Categorical perception: a phenomenon where a phoneme is perceived to be invariant within some specific range.
B. Limitations of evidence for “specialness” speech
1. Invariant characteristics: perception of speech is robust even with significant deterioration of
formant components of the waveform.
2. Categorical perception: non-speech signals have also been shown to be processed categorically
• key: sound is given a name (e.g., “plucked” vs. “bowed”)
C. Infant babbling, formant characteristics, and categorical processing are involved in speech perception but are only part of the story.
1. Pickett & Pickett (1963) “sliced conversation” study demonstrated the importance of context in speech perception
The McGurk Effect is where your visual perception of someone's mouth movements can actually change what you think you are hearing.
Damage to Broca’s Area → Broca’s Aphasia:
People understand language pretty well. But they struggle to speak — speech is slow, broken, and effortful.
Damage to Wernicke’s Area → Wernicke’s Aphasia:
People can speak fluently, but what they say often doesn't make sense (nonsense words, jumbled ideas). They can't understand what others are saying well either.
Extra credit:
Crazy Little Thing Called Love - Queen
Sugar Sugar - The Archies
The doors - Love her madly
Georgia satellites
Bad things - Jace Everat
Dream on, Aerosmith
The trees - Rush
Bahamian Rap City -
A. Proximity - notes that are played together close in time are perceived as belonging together.
Proximity forms the basis of melody perception.
B. Similarity - notes that are similar in pitch are perceived as belonging together.
C. Good continuation - A musical principle where melody progresses logically, allowing listeners to predict the path of the notes.
Basic Components of Music
A. Consonance / Dissonance - notes that sound pleasing when played together are referred to as consonant. Notes that sound unpleasant when played together are referred to as dissonant.
B. Rhythm - refers to the temporal relationship among sounds. Rhythm refers specifically to the
duration of each note or chord. Polyrhythm refers to the simultaneous playing of two independent rhythms.
C. Note - the listener’s perception of the frequency of the sound waves. Each musical sound of a distinct frequency is known as a note (e.g., the note A is 440 Hz; the note C is 523.3 Hz). Pitch indicates how high or low a note sounds. Harmonics refers to integer multiples of the note in question (e.g., the fundamental frequency); the harmonics of a note sound consonant.
D. Chord - multiple notes (usually 3 or 4) occurring simultaneously in time. Major chords (happy) and Minor chords (sad); middle note of minor chord a little lower in pitch than that of the major chord.
E. Melody - refers to the combined effect of pitch and rhythm.
F. Melody schema – refers to the representation of a familiar melody stored in memory.
2. Musician Stylistic Contributions to Music
A. Playing out of synchronization - Chords tend to sound more musical when each note is played
slightly out of synchronization; playing all the notes together tends to sound less musical.
B. Staccato vs. Legato - “choppy” playing (staccato) tends to not sound as musical as a smoother
transition (legato) between notes/chords.
C. Rubato – refers to “small expressive changes of pace”; changing the pace of play tends to sound more musical than to play at the same rate throughout the piece.