Perception: Template Matching, Feature Analysis, Structural Models (Geons) and Neural Mechanisms

  • Template Matching model

    • Core idea: after input is taken in, the brain creates a mental image/template and compares the input to a set of templates
    • Recognition occurs if a template fits well enough; if not, recognition fails
    • Limitations mentioned: not robust to variation in size, shape, and appearance; human experience requires more than a single template
    • Class discussion prompt: what would need to be added or changed to make the model more valid for human perception? (teacher asks for proposals)
    • Student response ( summarized from transcript): breaking down templates into component parts is useful; this leads toward feature-based approaches
    • Formal intuition (in notes): comparison/fit between input and templates can be modeled as a similarity score S(X, T_i) and a decision threshold
    • Possible formal representation: \maxi S(X, Ti) > \tau where \tau is a decision threshold
  • Feature Analysis model

    • Core idea: the stored memory representation consists of features (parts) of objects rather than whole objects
    • Features can be abstract or concrete; examples include Garfield’s hands, feet, belly, eyes, lasagna, and even a conceptual feature like
      "lasagna" associated with Garfield
    • When a stimulus is presented, recognition is a comparison between the incoming feature breakdown and stored memory features
    • Features can be shared across different stimuli, leading to redundancy but also a democratic process of feature-based recognition
    • Example described: Garfield is incognito in a given image; the feature breakdown (hands, feet, belly, eyes, lasagna, etc.) activates memory features that resemble Garfield enough to trigger recognition
    • Empirical implication: feature similarity/differences influence reaction times; more shared features between two stimuli lead to longer processing when distinguishing them
  • Empirical evidence on feature-based processing

    • Letters example: pairs of letters (same vs. different) tested; some pairs (e.g., g vs w) differ on few features and are easier to distinguish than others (e.g., p vs r) that share many features
    • Reason: some features are shared across stimuli, creating competition among feature representations
    • Reaction time predictions: greater feature similarity => longer RT; fewer shared features => shorter RT when distinguishing
    • Visual search task with “roundness” as a salient feature: locating the unique round queue among 20 letters becomes faster when you search for roundness first; changing task to emphasize different distinctive traits alters search strategy
    • Children’s reading/letter learning: emphasizing distinctive features (e.g., the presence of a bar in 'r' vs 'p') speeds feature-based discrimination during early learning
    • Caricatures vs. normal faces: caricatures can be identified faster due to exaggerated signature features; suggests feature salience and weighting in recognition
    • Semantic/semantic-like mapping: there is a notion of a hierarchy of features and a semantic map linking features to concepts
    • Limitations: feature-based models can overfit to features that do not uniquely identify an object; when features are presented with distortions or misaligned relations, misrecognitions can occur
    • Key takeaway: feature processing is rapid and informative, but recognition cannot rely on features alone; the relationship among features must also be considered
  • Limitations and problems with pure feature analysis

    • Ambiguity: same set of features can be configured to yield different interpretations
    • Abstract rules for features can be ad hoc; different feature descriptions can yield different predicted identifications
    • When features are presented in isolation, some stimuli can be misinterpreted (e.g., inverted images, ambiguous arrangements)
    • The upside-down Thatcher illusion illustrates that orientation and relational organization matter beyond mere feature presence
    • Some stimuli may appear identical at the primitive feature level, but differ in the way features relate to one another
    • Faces pose a particular challenge: although feature detection occurs (eyes, nose, mouth), humans show holistic processing and rapid recognition that sometimes transcends enumerating features alone
  • Structural (geon) analysis: Biederman’s component model

    • Core idea: objects are represented by a small set of basic shape primitives called geons and the relationships among them
    • Geons: stable to noise, color changes, and other distortions; building blocks like basic 3D shapes
    • Common geons include shapes such as cylinder, cone, pyramid, cup-like shapes, football (sphere-like), and horn
    • A basic claim: about 36 geons exist; objects can be represented by combinations and arrangements of these geons
    • The combinatorics: with as few as two geons, there can be up to 72{,}000 distinct configurations (approximate upper bound described)
    • The key concept: it is not just the presence of geons (features) but the spatial relations among geons that drive recognition
    • Example: “cup” vs “pail” can be distinguished by the particular arrangement of geons, not just the geons themselves
    • Human exemplars: a quadruped (e.g., a horse or dog) can be viewed as a stack/arrangement of geons; a biped (human) can be formed by stacking geons differently; a cup vs a pail differ in the arrangement rather than the pure geon set
    • Two main takeaways from geon theory:
    • Geons are robust to low-level changes and noise
    • The relations among geons (how they connect and are positioned) are essential for object identity
  • Limitations of geon-based (structural) models

    • At a primitive level, different objects can share the same geons and still be ambiguous without relational information
    • Some objects lose discriminability if there are only a few geons or if the geon relations are not informative enough
    • Not all aspects of perception are captured by relations among simple 3D geons; orientation issues and certain visual illusions challenge pure geon-based accounts
    • The Thatcher/face-inversion findings and face-specific processing suggest specialized processing beyond simple geon combinatorics for some stimuli
  • Peter Mitt’s component model (relation-focused recognition)

    • Design: an experiment where stimuli are flashed for a brief duration (about 100\,\text{ms})
    • Two conditions tested separately and contrasted:
    • Features-only presentation (only local features are visible)
    • Relations-only presentation (only the spatial relations among features are visible)
    • Findings summarized:
    • When only relations are present, participants are faster and more accurate than when only features are present
    • When only features are present, accuracy is around chance (roughly 50\%)
    • When both features and relations are available, recognition is best
    • The presence of relational information yields significantly higher accuracy (around 70\% correct in relation-only trials) compared to feature-only trials (~50\%)
    • Conclusion: the relations among features may be at least as important as features themselves for recognition; dependency on relationships supports a shift away from purely template or pure feature models toward relational structure in perception
    • Complementarity with other models: supports the idea that both features and relations matter, and sometimes the lack of relational information makes recognition difficult
  • Faces, holistic processing, and specialized brain systems

    • The fusiform face area (FFA) is highlighted as a brain region particularly involved in face processing
    • Face recognition is not entirely reducible to feature lists; there is evidence for holistic/configural processing
    • Inversion effects: faces are harder to recognize when inverted; Thatcher illusion shows distortions that reveal sensitivity to facial configuration
    • Special-case phenomena: some faces (or caricatures) may be recognized more quickly due to distinctive, salient features
    • It is noted that some research indicates that facial recognition involves a specialized neural system beyond general object recognition, complicating the idea that a single feature/structure model can fully account for faces
  • The neurobiological basis of perception and recognition

    • Occipital lobe as the initial processing stage; it feeds into two main pathways:
    • Dorsal stream: runs to the parietal lobe; associated with locating objects and guiding actions (the “where/how” pathway)
    • Ventral stream: runs to the temporal lobe; associated with identifying objects (the “what” pathway)
    • Fusiform Face Area (FFA): situated within the ventral stream, specialized for faces
    • Premotor cortex and action planning: the premotor area is involved not only when planning actions but also when observing others perform actions; this relates to learning through imitation and may involve mirror neurons
    • Mirror neurons: discovered in nonhuman primates and also present in humans; thought to play a role in learning goals, intentions, and imitation by mapping observed actions onto the observer’s motor system
    • Premotor activation observed even when a monkey anticipates actions performed by others; suggests anticipation/planning mechanisms that simulate actions
    • The big-picture view: perception involves a dynamic interaction of bottom-up sensory input and top-down expectations, beliefs, and prior experience
  • Brain plasticity, training, and experience

    • The brain is highly plastic and capable of reorganization based on experience
    • Training can alter perceptual recognition: examples show fusiform gyrus activity can change with sustained exposure to new stimuli and the brain can learn to recognize new patterns as meaningful (e.g., faces in novel shapes)
    • The general message: regular cognitive exercise (puzzles, hubs, etc.) strengthens neural pathways and can improve cognitive flexibility and recognition abilities over time
  • Movement, perception, and top-down influence

    • Movement and dynamic exploration can enhance recognition by providing multiple viewpoints and more feature-relations to compare against memory
    • Moving a scene or object generates more mental representations and pathways for comparison, increasing recognition accuracy
    • Bottom-up processing (from sensory input) works in concert with top-down processing (expectations and prior knowledge) to produce perception
  • Computer vision vs human perception

    • Computer vision systems rely heavily on iterative learning and pattern recognition; they can outperform humans in some tasks but can also miss obvious cues that humans detect quickly
    • Real-world demonstrations (e.g., the Tesla Road Runner-style video) show that machine vision can fail by not recognizing misleading cues in the environment
    • The lesson: human perception combines rapid feature detection, relational interpretation, behavioral context, and learned priors in a way that current computer systems still struggle to replicate fully
  • Practical takeaways and concluding notes

    • Perception is complex, sometimes counterintuitive, and relies on an integration of features and their relations
    • The brain actively builds and restructures neural networks through use and training; staying mentally active supports long-term cognitive flexibility
    • When trying to interpret ambiguous stimuli, moving around or looking from multiple angles can reveal additional features and relational cues that aid recognition
    • The course emphasizes turning theoretical models into testable predictions and recognizing their limitations in explaining real-world perception
  • Quick reflections and prompts for before next class

    • Prepare for quizzes on chapters two and three; cog lab on attentional blink due soon
    • Expect discussion on attention next class; potential overlap into the following week
    • Perception remains a central, fascinating area of cognition with ongoing research across templates, features, geons, and neural mechanisms