Lecture 3 - Perception and Recognition - SLIDES Notes
Part I: Visual Perception
Overview & Objectives
- Perception feels instantaneous & effortless, yet relies on many sequential and parallel processes.
- Course objectives for the visual-perception portion:
• The Visual System
• Visual Coding
• Form Perception
• Constancy
• Perception of Depth - Pathology reminder: Patient L.M. with akinetopsia (motion blindness) cannot perceive continuous motion (e.g., pouring coffee, crossing streets)—illustrates how a single component failure devastates perception.
1. The Visual System
• Vision is the dominant human sense—occupies more cortical territory than any other modality.
• In multisensory conflict, vision usually “wins” (e.g., ventriloquism illusion—sound localized where the visual dummy mouth moves).
Phototransduction sequence
- Light → cornea → lens (both focus) → retina (photoreceptors).
- Two photoreceptor classes: rods & cones.
Rods vs Cones (three-column comparison)
- Sensitivity:
• Rods: sensitive to dim light (night vision).
• Cones: require bright light; non-functional in the dark. - Acuity:
• Rods: low spatial acuity.
• Cones: high acuity. - Color:
• Rods: color-blind.
• Cones: color-sensitive (3 cone types). - Retinal distribution:
• Rods: none in fovea, dense in periphery.
• Cones: mostly in & near fovea; absent in extreme periphery.
Additional facts
- Fovea = highest cone density, smallest receptive fields → sharpest vision; but creates “star disappears” phenomenon: when starlight falls on cone-rich fovea (little rod activity) on dark night, star vanishes.
- Blind spot: region where optic nerve exits; no photoreceptors.
Electromagnetic spectrum
- Visible light ≈ 400\text{–}700\;\text{nm}.
Retino-cortical Pathway
- Retina: photoreceptors → bipolar cells → ganglion cells.
- Axons of ganglion cells form optic nerve → optic chiasm (partial decussation) → optic tract.
- Lateral Geniculate Nucleus (LGN) of thalamus.
- Primary Visual Cortex (V1) in occipital lobe.
Early Neural Computations—Lateral Inhibition
- Neighboring stimulated cells inhibit each other; yields edge enhancement.
- Example numbers: central bright cell to brain 1000\;\text{spikes/s}, neighboring dim cell 80; percept amplifies brightness contrast beyond actual luminance gradient.
2. Visual Coding & Single-Cell Properties
• Coding = relationship between stimulus characteristics & neuronal activity (encoding / decoding).
Single-cell recording findings
- Each neuron has a receptive field (RF): area where stimulus influences firing.
- Center–surround (dot) detectors: excited by central spot, inhibited by surround; whole-field stimulation cancels out.
- Additional RF selectivities:
• Orientation / “edge” detectors.
• Angle detectors.
• Direction & motion detectors.
• Corner detectors. - Nobel-winning work: Hubel & Wiesel (1981).
Parallel Processing Streams
- Simultaneous analysis of different attributes → faster & mutually supportive.
- Ventral ("what") pathway: occipital → inferotemporal cortex; object identity & form.
• Damage ⇒ visual agnosia (cannot recognize objects). - Dorsal ("where/how") pathway: occipital → posterior parietal cortex; spatial location & action guidance.
• Damage ⇒ visuomotor deficits (can see object but mis-reach).
Benefit highlights
- Speed/efficiency.
- Mutual influence (e.g., motion perception helps refine form, and vice versa).
3. The Binding Problem
• Multiple brain areas analyze color, motion, form, location separately—yet we perceive coherent objects.
Proposed solutions
- Spatial position correspondence—maps overlay.
- Neural synchrony—attributes belonging to same object fire in synchrony.
- Attention (top-down): directs neural binding; overload → conjunction errors (e.g., blue H + red T reported as blue T & red H).
4. Form Perception & Gestalt Principles
Core claim: Perception “goes beyond the information given.” (Gibson / Gestalt)
Reversible / ambiguous figures
- Necker cube, face–vase, etc.—illustrate active interpretation & top-down influence.
Gestalt organizational principles (impose structure)
- Similarity, proximity, good continuation, closure, common fate, symmetry, figure/ground segregation.
- Demonstrations show expectation-driven recognition (e.g., scrambled letters still read as “LIFT” once context hinted).
Parallel interplay
- Bottom-up (feature gathering) & top-down (interpretation) operate simultaneously—neither strictly first.
5. Perceptual Constancy
Definition: Perceiving stable properties despite changing retinal input—via unconscious inference (Helmholtz).
Types
- Brightness constancy: perceived reflectance stays constant across illumination changes (checker-shadow illusion).
- Shape constancy: door remains rectangular while swinging.
- Size constancy: retinal size varies inversely with distance; brain factors distance cues → stable size.
• If distance mis-estimated → illusions (e.g., “monsters in tunnel,” Ponzo). - Mechanisms:
- Stable ratios among parts of retinal image.
- Distance cues (see below).
- Top-down knowledge of typical object sizes.
6. Depth Perception
Need distance estimation for interaction & size judgments.
Distance/Depth cues
- Binocular disparity (stereopsis)
• Each eye’s image offset ⇒ depth even with static scene. Works best for near objects. - Monocular cues (one-eye sufficient):
• Accommodation (lens adjustment).
• Pictorial cues in 2-D images: interposition/occlusion, linear perspective, texture gradient, relative size, shading, aerial perspective, height in plane. - Motion cues:
• Motion parallax – nearer objects sweep faster across retina than distant ones.
• Optic flow – global pattern of expansion/contraction as observer moves; provides heading & distance information.
Cue weighting is adaptive—e.g., binocular disparity negligible for far mountains; pictorial & motion cues dominate.
Part II: Recognition
7. Perception vs Recognition—Agnosias
• Apperceptive agnosia (Patient D.F.): cannot bind features into holistic percept; drawing from memory OK.
• Associative agnosia (Dr. P., Sacks 1985): intact perception but cannot link percept to stored knowledge—attempted to "wear" his wife as a hat.
8. Recognition Processes—General Properties
- Must tolerate variation in stimulus (size, angle, exemplar).
- Both bottom-up (data-driven) & top-down (knowledge/expectation) influences.
- Start with features (lines, curves) detected early; assemble into larger units.
9. Feature Importance & Visual Search Experiments
Visual search tasks
- Target defined by single feature (“vertical line”) → pop-out, reaction time independent of set size.
- Target defined by conjunction of features (“vertical & red”) → conjunction search, RT increases linearly with set size.
Implications: feature analysis precedes & is separate from feature combination.
Graph: RT vs set size shows shallow slope for pop-out, steeper for conjunction.
10. Word Recognition Phenomena
Methods
- Tachistoscopic presentation (≈ 20\text{–}30\,\text{ms}) + mask to stop processing.
Factors enhancing recognition
- Word frequency: common words ("home") recognized more often.
- Repetition priming: recently seen words processed faster.
- Word-superiority effect (WSE): letters recognized more accurately when embedded in a word than in isolation.
- Well-formedness: nonwords like FIKE (pronounceable) easier than HZYQ; context supply influences & errors (DPUM read as DRUM).
Quiz example: seeing “platypus” earlier speeds later real/nonsense judgment ⇒ \textit{repetition priming}.
11. Feature Nets—A Computational Model
Architecture (Selfridge “Pandemonium,” McClelland & Rumelhart):
- Layers of detectors: feature → letter → bigram → word.
- Activation level rises with input; fires when threshold reached.
- Recency (warm-up) & frequency (exercise) adjust baseline activation, explaining priming & high-frequency advantages.
- Explains ambiguous input resolution ("THE CAT" vs "TAE CAT" where middle character interpreted by word context).
- Bias toward familiar patterns sometimes causes errors (CQRN → “CORN”) but overall boosts efficiency.
Extensions
- Interactive Activation & Competition (IAC) model—adds bidirectional & lateral inhibition; explains WSE via top-down feedback.
- Recognition-by-Components (RBC) for 3-D objects—intermediate units are geons (≈40–50 simple volumetric primitives).
• Geons viewpoint-invariant; few suffice per object; occlusion tolerant (if geon edges visible).
Limitations / need for top-down:
- Configuration (holistic) effects; sentence context priming surpasses isolated word priming; non-visual knowledge modulates recognition.
12. Object & Face Recognition in the Brain
Neurophysiology
- Inferotemporal (IT) cortex houses cells selective for complex objects (e.g., “Halle Berry” neuron—responds to face, written name, silhouette).
- Series of face patches culminating in fusiform face area (FFA):
• Specialization for individuating faces.
• Prosopagnosia – FFA lesions → face-identity blindness.
• Super-recognizers: exceptional FFA/face processing skill. - Faces exhibit strong inversion effect: turning upside-down impairs recognition far more than other object categories—suggests configural (holistic) processing.
- Expertise hypothesis: FFA may recruit for other domains of expertise (e.g., bird-watcher with prosopagnosia also impaired for warbler identification).
13. Ethical, Philosophical, & Practical Implications
• Understanding visual dominance helps design multisensory displays & mitigate illusions (e.g., in aviation).
• Insights into agnosias guide neuro-rehabilitation strategies.
• Knowledge of face-specific processing informs eyewitness-memory reliability debates & AI face-recognition ethics.
• Recognizing top-down biases warns of perception’s susceptibility to expectation & stereotypes.
14. Key Numerical & Experimental Facts (Quick Reference)
- Visible spectrum: 400\text{–}700\,\text{nm}.
- Star disappearance: fovea has 0 rods.
- Edge-enhancement example firing: center cell 1000\,\text{spikes/s} vs neighbor 80.
- Tachistoscope exposure: \approx 25\,\text{ms} typical.
- Motion parallax relation: perceived speed \propto 1/\text{distance}.
- Size–distance relation: doubling distance halves retinal image size \left(\text{size} \propto \frac{1}{\text{distance}}\right).
15. Common Exam Pitfalls & Tips
- Don’t confuse “what” vs “where” damages (agnosia vs visuomotor).
- Word-superiority is about letters in words, not words in sentences.
- Binding problem solutions involve spatial maps + synchrony + attention—be able to describe each.
- Be ready to name & illustrate at least three monocular pictorial cues.
- Remember that constancy relies on relationships in the retinal image and distance cues.