Perception: Template Matching, Feature Analysis, Structural Models (Geons) and Neural Mechanisms

Template Matching model
- Core idea: after input is taken in, the brain creates a mental image/template and compares the input to a set of templates
- Recognition occurs if a template fits well enough; if not, recognition fails
- Limitations mentioned: not robust to variation in size, shape, and appearance; human experience requires more than a single template
- Class discussion prompt: what would need to be added or changed to make the model more valid for human perception? (teacher asks for proposals)
- Student response ( summarized from transcript): breaking down templates into component parts is useful; this leads toward feature-based approaches
- Formal intuition (in notes): comparison/fit between input and templates can be modeled as a similarity score S(X, T_i) and a decision threshold
- Possible formal representation: \maxi S(X, Ti) > \tau where \tau is a decision threshold
Feature Analysis model
- Core idea: the stored memory representation consists of features (parts) of objects rather than whole objects
- Features can be abstract or concrete; examples include Garfield’s hands, feet, belly, eyes, lasagna, and even a conceptual feature like
  "lasagna" associated with Garfield
- When a stimulus is presented, recognition is a comparison between the incoming feature breakdown and stored memory features
- Features can be shared across different stimuli, leading to redundancy but also a democratic process of feature-based recognition
- Example described: Garfield is incognito in a given image; the feature breakdown (hands, feet, belly, eyes, lasagna, etc.) activates memory features that resemble Garfield enough to trigger recognition
- Empirical implication: feature similarity/differences influence reaction times; more shared features between two stimuli lead to longer processing when distinguishing them
Empirical evidence on feature-based processing
- Letters example: pairs of letters (same vs. different) tested; some pairs (e.g., g vs w) differ on few features and are easier to distinguish than others (e.g., p vs r) that share many features
- Reason: some features are shared across stimuli, creating competition among feature representations
- Reaction time predictions: greater feature similarity => longer RT; fewer shared features => shorter RT when distinguishing
- Visual search task with “roundness” as a salient feature: locating the unique round queue among 20 letters becomes faster when you search for roundness first; changing task to emphasize different distinctive traits alters search strategy
- Children’s reading/letter learning: emphasizing distinctive features (e.g., the presence of a bar in 'r' vs 'p') speeds feature-based discrimination during early learning
- Caricatures vs. normal faces: caricatures can be identified faster due to exaggerated signature features; suggests feature salience and weighting in recognition
- Semantic/semantic-like mapping: there is a notion of a hierarchy of features and a semantic map linking features to concepts
- Limitations: feature-based models can overfit to features that do not uniquely identify an object; when features are presented with distortions or misaligned relations, misrecognitions can occur
- Key takeaway: feature processing is rapid and informative, but recognition cannot rely on features alone; the relationship among features must also be considered
Limitations and problems with pure feature analysis
- Ambiguity: same set of features can be configured to yield different interpretations
- Abstract rules for features can be ad hoc; different feature descriptions can yield different predicted identifications
- When features are presented in isolation, some stimuli can be misinterpreted (e.g., inverted images, ambiguous arrangements)
- The upside-down Thatcher illusion illustrates that orientation and relational organization matter beyond mere feature presence
- Some stimuli may appear identical at the primitive feature level, but differ in the way features relate to one another
- Faces pose a particular challenge: although feature detection occurs (eyes, nose, mouth), humans show holistic processing and rapid recognition that sometimes transcends enumerating features alone
Structural (geon) analysis: Biederman’s component model
- Core idea: objects are represented by a small set of basic shape primitives called geons and the relationships among them
- Geons: stable to noise, color changes, and other distortions; building blocks like basic 3D shapes
- Common geons include shapes such as cylinder, cone, pyramid, cup-like shapes, football (sphere-like), and horn
- A basic claim: about 36 geons exist; objects can be represented by combinations and arrangements of these geons
- The combinatorics: with as few as two geons, there can be up to 72{,}000 distinct configurations (approximate upper bound described)
- The key concept: it is not just the presence of geons (features) but the spatial relations among geons that drive recognition
- Example: “cup” vs “pail” can be distinguished by the particular arrangement of geons, not just the geons themselves
- Human exemplars: a quadruped (e.g., a horse or dog) can be viewed as a stack/arrangement of geons; a biped (human) can be formed by stacking geons differently; a cup vs a pail differ in the arrangement rather than the pure geon set
- Two main takeaways from geon theory:
- Geons are robust to low-level changes and noise
- The relations among geons (how they connect and are positioned) are essential for object identity
Limitations of geon-based (structural) models
- At a primitive level, different objects can share the same geons and still be ambiguous without relational information
- Some objects lose discriminability if there are only a few geons or if the geon relations are not informative enough
- Not all aspects of perception are captured by relations among simple 3D geons; orientation issues and certain visual illusions challenge pure geon-based accounts
- The Thatcher/face-inversion findings and face-specific processing suggest specialized processing beyond simple geon combinatorics for some stimuli
Peter Mitt’s component model (relation-focused recognition)
- Design: an experiment where stimuli are flashed for a brief duration (about 100\,\text{ms})
- Two conditions tested separately and contrasted:
- Features-only presentation (only local features are visible)
- Relations-only presentation (only the spatial relations among features are visible)
- Findings summarized:
- When only relations are present, participants are faster and more accurate than when only features are present
- When only features are present, accuracy is around chance (roughly 50\%)
- When both features and relations are available, recognition is best
- The presence of relational information yields significantly higher accuracy (around 70\% correct in relation-only trials) compared to feature-only trials (~50\%)
- Conclusion: the relations among features may be at least as important as features themselves for recognition; dependency on relationships supports a shift away from purely template or pure feature models toward relational structure in perception
- Complementarity with other models: supports the idea that both features and relations matter, and sometimes the lack of relational information makes recognition difficult
Faces, holistic processing, and specialized brain systems
- The fusiform face area (FFA) is highlighted as a brain region particularly involved in face processing
- Face recognition is not entirely reducible to feature lists; there is evidence for holistic/configural processing
- Inversion effects: faces are harder to recognize when inverted; Thatcher illusion shows distortions that reveal sensitivity to facial configuration
- Special-case phenomena: some faces (or caricatures) may be recognized more quickly due to distinctive, salient features
- It is noted that some research indicates that facial recognition involves a specialized neural system beyond general object recognition, complicating the idea that a single feature/structure model can fully account for faces
The neurobiological basis of perception and recognition
- Occipital lobe as the initial processing stage; it feeds into two main pathways:
- Dorsal stream: runs to the parietal lobe; associated with locating objects and guiding actions (the “where/how” pathway)
- Ventral stream: runs to the temporal lobe; associated with identifying objects (the “what” pathway)
- Fusiform Face Area (FFA): situated within the ventral stream, specialized for faces
- Premotor cortex and action planning: the premotor area is involved not only when planning actions but also when observing others perform actions; this relates to learning through imitation and may involve mirror neurons
- Mirror neurons: discovered in nonhuman primates and also present in humans; thought to play a role in learning goals, intentions, and imitation by mapping observed actions onto the observer’s motor system
- Premotor activation observed even when a monkey anticipates actions performed by others; suggests anticipation/planning mechanisms that simulate actions
- The big-picture view: perception involves a dynamic interaction of bottom-up sensory input and top-down expectations, beliefs, and prior experience
Brain plasticity, training, and experience
- The brain is highly plastic and capable of reorganization based on experience
- Training can alter perceptual recognition: examples show fusiform gyrus activity can change with sustained exposure to new stimuli and the brain can learn to recognize new patterns as meaningful (e.g., faces in novel shapes)
- The general message: regular cognitive exercise (puzzles, hubs, etc.) strengthens neural pathways and can improve cognitive flexibility and recognition abilities over time
Movement, perception, and top-down influence
- Movement and dynamic exploration can enhance recognition by providing multiple viewpoints and more feature-relations to compare against memory
- Moving a scene or object generates more mental representations and pathways for comparison, increasing recognition accuracy
- Bottom-up processing (from sensory input) works in concert with top-down processing (expectations and prior knowledge) to produce perception
Computer vision vs human perception
- Computer vision systems rely heavily on iterative learning and pattern recognition; they can outperform humans in some tasks but can also miss obvious cues that humans detect quickly
- Real-world demonstrations (e.g., the Tesla Road Runner-style video) show that machine vision can fail by not recognizing misleading cues in the environment
- The lesson: human perception combines rapid feature detection, relational interpretation, behavioral context, and learned priors in a way that current computer systems still struggle to replicate fully
Practical takeaways and concluding notes
- Perception is complex, sometimes counterintuitive, and relies on an integration of features and their relations
- The brain actively builds and restructures neural networks through use and training; staying mentally active supports long-term cognitive flexibility
- When trying to interpret ambiguous stimuli, moving around or looking from multiple angles can reveal additional features and relational cues that aid recognition
- The course emphasizes turning theoretical models into testable predictions and recognizing their limitations in explaining real-world perception
Quick reflections and prompts for before next class
- Prepare for quizzes on chapters two and three; cog lab on attentional blink due soon
- Expect discussion on attention next class; potential overlap into the following week
- Perception remains a central, fascinating area of cognition with ongoing research across templates, features, geons, and neural mechanisms