Perception: Bottom-Up & Top-Down Processes, Structuralism, and Gestalt Principles
- Sensation = purely neural/physiological registration of energy at receptors (bottom-up).
- Perception = cognitive, meaning-oriented interpretation of that registration (top-down).
- Dalmatian video demo: identical sensory input across repeated viewings, but once higher-level representation forms, the dog “pops out” instantly. Activation of identical sensory neurons → different percept because of top-down influence.
“Smart” Percepts and Constancy Examples
- The perceptual system evolved to deliver action-relevant, environmentally stable percepts, not raw receptor readouts.
- Color constancy: surfaces in shadow are perceptually lightened – mind discounts illumination.
- Olfaction constancy: force of a sniff changes air volume, not molecule concentration, so perceived intensity remains stable.
- General property: percepts are resistant to “nuisance” variables (illumination, sniff strength, distance, etc.). This resistance ensures that our perception of the world is stable and consistent, allowing us to accurately interact with objects despite constantly changing sensory input conditions.
Bottom-Up vs. Top-Down Processing
- Bottom-up: data-driven activation from receptors through successive feature levels.
- Top-down: concept-driven feedback that reshapes early processing; makes perception quicker, more accurate, prediction-based.
Contextual Facilitation (Palmer, 1970s)
- Prime picture of a kitchen → activates “kitchen” schema.
- Recognition of a congruent object (loaf of bread) is ≈30% more accurate/faster than incongruent (drum) or deceptive-shape (mailbox) objects.
- Demonstrates strength of conceptual/top-down activation.
Pattern-Recognition Problem & Neural Networks
- Infinite variability of patterns cannot be handled by fixed templates.
- Modern AI succeeds by massively parallel networks using activation/inhibition across billions of weighted connections – conceptually similar to brain circuits, but scientific understanding of their internal principles remains thin.
- These "deep learning" networks learn hierarchical feature representations directly from data, enabling robust recognition even with significant variations (e.g., different perspectives, lighting, occlusions). While highly effective, fully understanding how these complex networks arrive at their decisions (their "interpretability") is an ongoing research challenge.
Interactive Activation Model (IAM) for Word Recognition
Architecture
- Feature level – oriented line detectors (vertical, diagonal, horizontal, curves…).
- Letter level – units representing A…Z in specific positions.
- Word level – lexical units.
Dynamics
- Bottom-up excitation from features → letters → words.
- Lateral inhibition among competing units within a level (e.g., TRAP inhibits TRIP).
- Top-down feedback: highly active word boosts its constituent letters, which in turn boost compatible features – full interactivity.
Empirical Support: Word Superiority Effect (Reicher–Wheeler)
- Task: identify target letter position after 50–60 ms display + mask.
- RT/accuracy order: real word (FORK) > pronounceable non-word (e.g., MORK) > single letter (K) > unpronounceable non-word (e.g., ZQKP).
- Explanation: When a real word like FORK is presented, its activation at the word level feeds back down to boost the activation of its constituent letters (F, O, R, K). This top-down support makes it easier to identify individual letters within a word context compared to when the letter appears alone or in a context that does not provide such facilitative feedback.
Feature Integration Theory (FIT) – Treisman
Two-Stage Model
- Preattentive stage: independent feature maps (color, orientation, size, curvature, etc.).
- Focused-attention stage: spatial attention “glues” features into object files.
Evidence
- Illusory conjunctions: When stimuli are flashed briefly (e.g., 200 ms displays) and attention is diverted (e.g., by asking for digit-report first), participants often make ~20% mix-and-match errors (e.g., reporting a small green triangle when a large green circle and a small red triangle were actually present). This suggests that features (color, shape, size) are initially perceived independently in the preattentive stage and only later "glued" together by focused attention.
- Balint’s syndrome (RM): parietal damage → chronic illusory conjunctions even with long exposure, demonstrating the critical role of the damaged attentional binding mechanism.
- Top-down labelling (“carrot, lake, tire”) sharply reduces conjunction errors – concept guides binding.
Recognition-by-Components (RBC) – Biederman
Geons
- ~36 primitive 3-D “geometric ions” (cylinder, cone, block, curved cylinder, truncated pyramid…).
- Flexible (stretch, taper) but categorical.
Properties
- Discriminability – geons are visually distinct.
- View invariance – non-accidental properties (parallelism, symmetry, collinearity) survive most viewpoints.
- Robustness to noise – object recognizable with partial geon information (flashlight demo). Removal of critical edges wipes recognition.
Limitations
- Mental-rotation cost: same–different judgements slower as relative rotation increases. While RBC predicts view invariance for object identification based on non-accidental properties, the existence of mental rotation costs for precise same-different judgments suggests that perception often involves viewpoint-dependent processing for fine distinctions or when 3D rotation is required to match two stimuli. True invariance is imperfect.
Problems for Purely Bottom-Up (Structuralist) Accounts
- Inverse-projection ambiguity: same retinal size from small/near vs. large/far objects. This requires top-down knowledge (e.g., an object's typical size) to resolve.
- Occlusion: overlapping objects create fragmented images – camera vs. mind.
- Speech segmentation: silence occurs inside words, not between; yet humans parse effortlessly.
- Illumination/ shadow ambiguities (A–B–C tile demo) – mind factors physical knowledge.
Perceptual Heuristics & Unconscious Inference (Helmholtz)
- Occlusion heuristic – assume surfaces continue smoothly behind occluders (blue-B demo).
- Light-from-above assumption – determines concave vs. convex interpretation (sand-bump rotation).
- These fast rules comprise “perceptual intelligence” absent in naive machines.
Structuralism vs. Gestalt Approaches
Structuralism (Wundt, Titchener)
- Analogy to chemistry: identify elementary sensations, then build up via association.
- Weak on explaining grouping principles for novel, unalphabetic stimuli.
Gestalt Psychology (Wertheimer, Köhler, Koffka)
- Analogy to physics/field theory: global “forces” instantly organize parts into wholes.
- Emphasis on top-down, holistic laws.
- Percept chooses the most regular, symmetrical, simple, meaningful organization compatible with stimulus.
- Illustrated by Olympic rings vs. “broken arcs” interpretation.
Specific Gestalt Grouping Principles
| Law | Core Rule | Classic Demonstration |
|---|
| | |
| Similarity | Like items cluster | Red/green columns & rows, S/5 dot pictures |
| Good Continuation | Elements forming smooth lines seen as unit | X-shaped crossing vs. zig-zag angles; “apple cores” morph into two moons when touching |
| Proximity | Near items cluster | Evenly spaced vs. horizontally compressed dot rows |
| Common Fate | Elements moving together group | Coherent ellipse in random-dot kinematogram; prey/predator detection |
| Familiarity/Meaning | Configurations resembling known objects pop out | Forest-faces painting (12 hidden faces); frog camouflaged in leaves |
Connections, Implications & Real-World Relevance
- Modern deep networks loosely mirror interactive activation but lack explicit human-style heuristics; performance leaps due to data/scale, not principle comprehension.
- Visual interface design, camouflage, military decoys, and UI icons exploit Gestalt laws.
- Clinical relevance: Balint & visual-form agnos