Perception: Bottom-Up & Top-Down Processes, Structuralism, and Gestalt Principles

Sensation vs. Perception – “Making Contact” vs. “Making Sense”

Sensation = purely neural/physiological registration of energy at receptors (bottom-up).
Perception = cognitive, meaning-oriented interpretation of that registration (top-down).
Dalmatian video demo: identical sensory input across repeated viewings, but once higher-level representation forms, the dog “pops out” instantly. Activation of identical sensory neurons → different percept because of top-down influence.

“Smart” Percepts and Constancy Examples

The perceptual system evolved to deliver action-relevant, environmentally stable percepts, not raw receptor readouts.
Color constancy: surfaces in shadow are perceptually lightened – mind discounts illumination.
Olfaction constancy: force of a sniff changes air volume, not molecule concentration, so perceived intensity remains stable.
General property: percepts are resistant to “nuisance” variables (illumination, sniff strength, distance, etc.). This resistance ensures that our perception of the world is stable and consistent, allowing us to accurately interact with objects despite constantly changing sensory input conditions.

Bottom-Up vs. Top-Down Processing

Bottom-up: data-driven activation from receptors through successive feature levels.
Top-down: concept-driven feedback that reshapes early processing; makes perception quicker, more accurate, prediction-based.

Contextual Facilitation (Palmer, 1970s)

Prime picture of a kitchen → activates “kitchen” schema.
Recognition of a congruent object (loaf of bread) is $\approx$ 30% more accurate/faster than incongruent (drum) or deceptive-shape (mailbox) objects.
Demonstrates strength of conceptual/top-down activation.

Pattern-Recognition Problem & Neural Networks

Infinite variability of patterns cannot be handled by fixed templates.
Modern AI succeeds by massively parallel networks using activation/inhibition across billions of weighted connections – conceptually similar to brain circuits, but scientific understanding of their internal principles remains thin.
- These "deep learning" networks learn hierarchical feature representations directly from data, enabling robust recognition even with significant variations (e.g., different perspectives, lighting, occlusions). While highly effective, fully understanding how these complex networks arrive at their decisions (their "interpretability") is an ongoing research challenge.

Interactive Activation Model (IAM) for Word Recognition

Architecture

Feature level – oriented line detectors (vertical, diagonal, horizontal, curves…).
Letter level – units representing A…Z in specific positions.
Word level – lexical units.

Dynamics

Bottom-up excitation from features → letters → words.
Lateral inhibition among competing units within a level (e.g., TRAP inhibits TRIP).
Top-down feedback: highly active word boosts its constituent letters, which in turn boost compatible features – full interactivity.

Empirical Support: Word Superiority Effect (Reicher–Wheeler)

Task: identify target letter position after 50–60 ms display + mask.
RT/accuracy order: real word (FORK) > pronounceable non-word (e.g., MORK) > single letter (K) > unpronounceable non-word (e.g., ZQKP).
Explanation: When a real word like FORK is presented, its activation at the word level feeds back down to boost the activation of its constituent letters (F, O, R, K). This top-down support makes it easier to identify individual letters within a word context compared to when the letter appears alone or in a context that does not provide such facilitative feedback.

Feature Integration Theory (FIT) – Treisman

Two-Stage Model

Preattentive stage: independent feature maps (color, orientation, size, curvature, etc.).
Focused-attention stage: spatial attention “glues” features into object files.

Evidence

Illusory conjunctions: When stimuli are flashed briefly (e.g., 200 ms displays) and attention is diverted (e.g., by asking for digit-report first), participants often make ~20% mix-and-match errors (e.g., reporting a small green triangle when a large green circle and a small red triangle were actually present). This suggests that features (color, shape, size) are initially perceived independently in the preattentive stage and only later "glued" together by focused attention.
Balint’s syndrome (RM): parietal damage → chronic illusory conjunctions even with long exposure, demonstrating the critical role of the damaged attentional binding mechanism.
Top-down labelling (“carrot, lake, tire”) sharply reduces conjunction errors – concept guides binding.

Recognition-by-Components (RBC) – Biederman

Geons

~36 primitive 3-D “geometric ions” (cylinder, cone, block, curved cylinder, truncated pyramid…).
Flexible (stretch, taper) but categorical.

Properties

Discriminability – geons are visually distinct.
View invariance – non-accidental properties (parallelism, symmetry, collinearity) survive most viewpoints.
Robustness to noise – object recognizable with partial geon information (flashlight demo). Removal of critical edges wipes recognition.

Limitations

Mental-rotation cost: same–different judgements slower as relative rotation increases. While RBC predicts view invariance for object identification based on non-accidental properties, the existence of mental rotation costs for precise same-different judgments suggests that perception often involves viewpoint-dependent processing for fine distinctions or when 3D rotation is required to match two stimuli. True invariance is imperfect.

Problems for Purely Bottom-Up (Structuralist) Accounts

Inverse-projection ambiguity: same retinal size from small/near vs. large/far objects. This requires top-down knowledge (e.g., an object's typical size) to resolve.
Occlusion: overlapping objects create fragmented images – camera vs. mind.
Speech segmentation: silence occurs inside words, not between; yet humans parse effortlessly.
Illumination/ shadow ambiguities (A–B–C tile demo) – mind factors physical knowledge.

Perceptual Heuristics & Unconscious Inference (Helmholtz)

Occlusion heuristic – assume surfaces continue smoothly behind occluders (blue-B demo).
Light-from-above assumption – determines concave vs. convex interpretation (sand-bump rotation).
These fast rules comprise “perceptual intelligence” absent in naive machines.

Structuralism vs. Gestalt Approaches

Structuralism (Wundt, Titchener)

Analogy to chemistry: identify elementary sensations, then build up via association.
Weak on explaining grouping principles for novel, unalphabetic stimuli.

Gestalt Psychology (Wertheimer, Köhler, Koffka)

Analogy to physics/field theory: global “forces” instantly organize parts into wholes.
Emphasis on top-down, holistic laws.

Gestalt Law of Prägnanz (Good Form)

Percept chooses the most regular, symmetrical, simple, meaningful organization compatible with stimulus.
Illustrated by Olympic rings vs. “broken arcs” interpretation.

Specific Gestalt Grouping Principles

Law	Core Rule	Classic Demonstration

Similarity	Like items cluster	Red/green columns & rows, S/5 dot pictures
Good Continuation	Elements forming smooth lines seen as unit	X-shaped crossing vs. zig-zag angles; “apple cores” morph into two moons when touching
Proximity	Near items cluster	Evenly spaced vs. horizontally compressed dot rows
Common Fate	Elements moving together group	Coherent ellipse in random-dot kinematogram; prey/predator detection
Familiarity/Meaning	Configurations resembling known objects pop out	Forest-faces painting (12 hidden faces); frog camouflaged in leaves

Connections, Implications & Real-World Relevance

Modern deep networks loosely mirror interactive activation but lack explicit human-style heuristics; performance leaps due to data/scale, not principle comprehension.
Visual interface design, camouflage, military decoys, and UI icons exploit Gestalt laws.
Clinical relevance: Balint & visual-form agnos