Lecture 3 - Perception and Recognition - SLIDES Notes

Part I: Visual Perception

Overview & Objectives
  • Perception feels instantaneous & effortless, yet relies on many sequential and parallel processes.
  • Course objectives for the visual-perception portion:
    • The Visual System
    • Visual Coding
    • Form Perception
    • Constancy
    • Perception of Depth
  • Pathology reminder: Patient L.M. with akinetopsia (motion blindness) cannot perceive continuous motion (e.g., pouring coffee, crossing streets)—illustrates how a single component failure devastates perception.

1. The Visual System

• Vision is the dominant human sense—occupies more cortical territory than any other modality.
• In multisensory conflict, vision usually “wins” (e.g., ventriloquism illusion—sound localized where the visual dummy mouth moves).

Phototransduction sequence

  • Light → cornealens (both focus) → retina (photoreceptors).
  • Two photoreceptor classes: rods & cones.

Rods vs Cones (three-column comparison)

  • Sensitivity:
    • Rods: sensitive to dim light (night vision).
    • Cones: require bright light; non-functional in the dark.
  • Acuity:
    • Rods: low spatial acuity.
    • Cones: high acuity.
  • Color:
    • Rods: color-blind.
    • Cones: color-sensitive (3 cone types).
  • Retinal distribution:
    • Rods: none in fovea, dense in periphery.
    • Cones: mostly in & near fovea; absent in extreme periphery.

Additional facts

  • Fovea = highest cone density, smallest receptive fields → sharpest vision; but creates “star disappears” phenomenon: when starlight falls on cone-rich fovea (little rod activity) on dark night, star vanishes.
  • Blind spot: region where optic nerve exits; no photoreceptors.

Electromagnetic spectrum

  • Visible light ≈ 400\text{–}700\;\text{nm}.
Retino-cortical Pathway
  1. Retina: photoreceptors → bipolar cells → ganglion cells.
  2. Axons of ganglion cells form optic nerveoptic chiasm (partial decussation) → optic tract.
  3. Lateral Geniculate Nucleus (LGN) of thalamus.
  4. Primary Visual Cortex (V1) in occipital lobe.
Early Neural Computations—Lateral Inhibition
  • Neighboring stimulated cells inhibit each other; yields edge enhancement.
  • Example numbers: central bright cell to brain 1000\;\text{spikes/s}, neighboring dim cell 80; percept amplifies brightness contrast beyond actual luminance gradient.

2. Visual Coding & Single-Cell Properties

Coding = relationship between stimulus characteristics & neuronal activity (encoding / decoding).

Single-cell recording findings

  • Each neuron has a receptive field (RF): area where stimulus influences firing.
  • Center–surround (dot) detectors: excited by central spot, inhibited by surround; whole-field stimulation cancels out.
  • Additional RF selectivities:
    • Orientation / “edge” detectors.
    • Angle detectors.
    • Direction & motion detectors.
    • Corner detectors.
  • Nobel-winning work: Hubel & Wiesel (1981).
Parallel Processing Streams
  • Simultaneous analysis of different attributes → faster & mutually supportive.
  • Ventral ("what") pathway: occipital → inferotemporal cortex; object identity & form.
    • Damage ⇒ visual agnosia (cannot recognize objects).
  • Dorsal ("where/how") pathway: occipital → posterior parietal cortex; spatial location & action guidance.
    • Damage ⇒ visuomotor deficits (can see object but mis-reach).

Benefit highlights

  1. Speed/efficiency.
  2. Mutual influence (e.g., motion perception helps refine form, and vice versa).

3. The Binding Problem

• Multiple brain areas analyze color, motion, form, location separately—yet we perceive coherent objects.

Proposed solutions

  1. Spatial position correspondence—maps overlay.
  2. Neural synchrony—attributes belonging to same object fire in synchrony.
  3. Attention (top-down): directs neural binding; overload → conjunction errors (e.g., blue H + red T reported as blue T & red H).

4. Form Perception & Gestalt Principles

Core claim: Perception “goes beyond the information given.” (Gibson / Gestalt)

Reversible / ambiguous figures

  • Necker cube, face–vase, etc.—illustrate active interpretation & top-down influence.

Gestalt organizational principles (impose structure)

  • Similarity, proximity, good continuation, closure, common fate, symmetry, figure/ground segregation.
  • Demonstrations show expectation-driven recognition (e.g., scrambled letters still read as “LIFT” once context hinted).

Parallel interplay

  • Bottom-up (feature gathering) & top-down (interpretation) operate simultaneously—neither strictly first.

5. Perceptual Constancy

Definition: Perceiving stable properties despite changing retinal input—via unconscious inference (Helmholtz).

Types

  • Brightness constancy: perceived reflectance stays constant across illumination changes (checker-shadow illusion).
  • Shape constancy: door remains rectangular while swinging.
  • Size constancy: retinal size varies inversely with distance; brain factors distance cues → stable size.
    • If distance mis-estimated → illusions (e.g., “monsters in tunnel,” Ponzo).
  • Mechanisms:
    1. Stable ratios among parts of retinal image.
    2. Distance cues (see below).
    3. Top-down knowledge of typical object sizes.

6. Depth Perception

Need distance estimation for interaction & size judgments.

Distance/Depth cues

  1. Binocular disparity (stereopsis)
    • Each eye’s image offset ⇒ depth even with static scene. Works best for near objects.
  2. Monocular cues (one-eye sufficient):
    • Accommodation (lens adjustment).
    Pictorial cues in 2-D images: interposition/occlusion, linear perspective, texture gradient, relative size, shading, aerial perspective, height in plane.
  3. Motion cues:
    Motion parallax – nearer objects sweep faster across retina than distant ones.
    Optic flow – global pattern of expansion/contraction as observer moves; provides heading & distance information.

Cue weighting is adaptive—e.g., binocular disparity negligible for far mountains; pictorial & motion cues dominate.


Part II: Recognition

7. Perception vs Recognition—Agnosias

Apperceptive agnosia (Patient D.F.): cannot bind features into holistic percept; drawing from memory OK.
Associative agnosia (Dr. P., Sacks 1985): intact perception but cannot link percept to stored knowledge—attempted to "wear" his wife as a hat.

8. Recognition Processes—General Properties
  • Must tolerate variation in stimulus (size, angle, exemplar).
  • Both bottom-up (data-driven) & top-down (knowledge/expectation) influences.
  • Start with features (lines, curves) detected early; assemble into larger units.

9. Feature Importance & Visual Search Experiments

Visual search tasks

  • Target defined by single feature (“vertical line”) → pop-out, reaction time independent of set size.
  • Target defined by conjunction of features (“vertical & red”) → conjunction search, RT increases linearly with set size.
    Implications: feature analysis precedes & is separate from feature combination.
    Graph: RT vs set size shows shallow slope for pop-out, steeper for conjunction.

10. Word Recognition Phenomena

Methods

  • Tachistoscopic presentation (≈ 20\text{–}30\,\text{ms}) + mask to stop processing.

Factors enhancing recognition

  1. Word frequency: common words ("home") recognized more often.
  2. Repetition priming: recently seen words processed faster.
  3. Word-superiority effect (WSE): letters recognized more accurately when embedded in a word than in isolation.
  4. Well-formedness: nonwords like FIKE (pronounceable) easier than HZYQ; context supply influences & errors (DPUM read as DRUM).

Quiz example: seeing “platypus” earlier speeds later real/nonsense judgment ⇒ \textit{repetition priming}.

11. Feature Nets—A Computational Model

Architecture (Selfridge “Pandemonium,” McClelland & Rumelhart):

  • Layers of detectors: feature → letter → bigram → word.
  • Activation level rises with input; fires when threshold reached.
  • Recency (warm-up) & frequency (exercise) adjust baseline activation, explaining priming & high-frequency advantages.
  • Explains ambiguous input resolution ("THE CAT" vs "TAE CAT" where middle character interpreted by word context).
  • Bias toward familiar patterns sometimes causes errors (CQRN → “CORN”) but overall boosts efficiency.

Extensions

  1. Interactive Activation & Competition (IAC) model—adds bidirectional & lateral inhibition; explains WSE via top-down feedback.
  2. Recognition-by-Components (RBC) for 3-D objects—intermediate units are geons (≈40–50 simple volumetric primitives).
    • Geons viewpoint-invariant; few suffice per object; occlusion tolerant (if geon edges visible).

Limitations / need for top-down:

  • Configuration (holistic) effects; sentence context priming surpasses isolated word priming; non-visual knowledge modulates recognition.

12. Object & Face Recognition in the Brain

Neurophysiology

  • Inferotemporal (IT) cortex houses cells selective for complex objects (e.g., “Halle Berry” neuron—responds to face, written name, silhouette).
  • Series of face patches culminating in fusiform face area (FFA):
    • Specialization for individuating faces.
    Prosopagnosia – FFA lesions → face-identity blindness.
    Super-recognizers: exceptional FFA/face processing skill.
  • Faces exhibit strong inversion effect: turning upside-down impairs recognition far more than other object categories—suggests configural (holistic) processing.
  • Expertise hypothesis: FFA may recruit for other domains of expertise (e.g., bird-watcher with prosopagnosia also impaired for warbler identification).

13. Ethical, Philosophical, & Practical Implications

• Understanding visual dominance helps design multisensory displays & mitigate illusions (e.g., in aviation).
• Insights into agnosias guide neuro-rehabilitation strategies.
• Knowledge of face-specific processing informs eyewitness-memory reliability debates & AI face-recognition ethics.
• Recognizing top-down biases warns of perception’s susceptibility to expectation & stereotypes.


14. Key Numerical & Experimental Facts (Quick Reference)

  • Visible spectrum: 400\text{–}700\,\text{nm}.
  • Star disappearance: fovea has 0 rods.
  • Edge-enhancement example firing: center cell 1000\,\text{spikes/s} vs neighbor 80.
  • Tachistoscope exposure: \approx 25\,\text{ms} typical.
  • Motion parallax relation: perceived speed \propto 1/\text{distance}.
  • Size–distance relation: doubling distance halves retinal image size \left(\text{size} \propto \frac{1}{\text{distance}}\right).

15. Common Exam Pitfalls & Tips

  • Don’t confuse “what” vs “where” damages (agnosia vs visuomotor).
  • Word-superiority is about letters in words, not words in sentences.
  • Binding problem solutions involve spatial maps + synchrony + attention—be able to describe each.
  • Be ready to name & illustrate at least three monocular pictorial cues.
  • Remember that constancy relies on relationships in the retinal image and distance cues.