Perception and Sensation: Key Concepts, Theories, and Models (Lecture Notes)

Sensation vs Perception: foundational distinctions

Perception is the interpretation and labeling of sensory input; sensation is the raw input that enters the senses. Willow Went (and colleagues) distinguished:
- Sensation: detecting raw input and bringing it into conscious awareness. In short, the sensing of external stimuli.
- Perception: the labeling, interpretation, and conscious experience that results from processing those sensations. It’s the act of making sense of what is sensed.
Quick takeaway: sensation = input; perception = interpretation of that input.

The senses: beyond the classical five

Common notion: there are five senses (sight, hearing, touch, taste, smell).
The lecture emphasizes that this is an oversimplification, not an exhaustive account of human sensing.
More senses exist and vision itself is multi-faceted:
- Vision comprises detection of color, light, contours, shapes, and other features by different receptors.
- Taste involves multiple receptors for different soluble molecules, not just one taste.
- Additional senses: pain, temperature, proprioception (body position), balance, time perception, etc.
A diagram (from a body-text type source) shows numerous senses, sometimes counted up to around 40 depending on how granular you define them.
Conceptual implication: perception integrates information across many modalities and features, not just a single, simplistic sense.

Perception vs sensation: core concepts and intuition

Perception is the labeling and interpretation process; sensation is the moment information enters the sensory systems.
Perception is a process; sensation is input. Perception builds conscious experience by interpreting inputs.
Everyday perception is highly efficient and seamless, often occurring without conscious deliberation due to rapid processing and learned expectations.
Perceptual experience is typically imperfect; stimuli are rarely perfect representations of their real-world sources, yet our brains generate stable, meaningful interpretations.
Examples highlighted in the lecture:
- A rectangle on a screen is often perceived as a rectangle even if the projected shape is distorted by perspective, angle, or lighting.
- The retina can receive the same input from different shapes/orientations, yet we experience a consistent object identity due to perceptual inference.
- The same image can be interpreted as different objects depending on context and prior knowledge (e.g., chairs taken from different angles still read as a chair).
- Humans infer occluded features (glasses, pencils) even when parts are hidden, demonstrating perceptual completion.

Bottom-up vs Top-down processing: two parallel pathways

Bottom-up (direct) processing:
- Perception begins with the stimulus input and builds up to recognition from sensory data alone.
- Also called “direct theory” in the lecture; data-driven, stimulus-driven processing.
Top-down (indirect) processing:
- Perception is influenced by prior knowledge, expectations, beliefs, and context; prior information helps interpret ambiguous inputs.
- Also called “indirect theory” or “top-down” processing; knowledge-driven interpretation.
These pathways operate in parallel and continuously influence each other during perception.
Everyday examples:
- Ambiguous images: context determines whether you see a square, a face, or another shape.
- Shadow in water example: depending on expectations (e.g., shark vs toy), you may interpret a shadow differently.
Key terms:
- Top-down processing (indirect theory): perception guided by beliefs, expectations, and prior experience.
- Bottom-up processing (direct theory): perception driven by the sensory input itself.
The lecture also ties these ideas to linguistic/auditory perception: speech segmentation relies on both bottom-up cues (sounds) and top-down expectations (language structure and prior experience).

Perception in practice: examples and demonstrations

Illusion and perspective examples illustrate how perception uses context to interpret input as coherent objects.
Object completion: even when parts of an object are occluded, we perceive a whole object (e.g., glasses behind a laptop, pencil partially visible).
Face and object recognition: humans are unusually adept at recognizing faces and objects even when features are degraded or presented from difficult angles.
Computer vision vs human perception: humans often outperform computers in recognizing incomplete or ambiguous stimuli because we leverage learned context and prior knowledge.
Statistical learning in perception:
- We learn from exposure to sequences and regularities; we detect familiar vs unfamiliar patterns.
- Example: sequences of shapes shown in a cog lab task; participants judged which sequence felt more familiar.
- Humans show robust pattern-detection abilities, sometimes more so than random algorithms in certain tasks.
A cross-domain example: auditory perception and speech segmentation relies on transitional probabilities between sounds within words, enabling language learning through statistics.

Statistical learning, regularities, and probabilistic inference

Statistical learning (Hebbian-like associations):
- Idea: neurons that fire together wire together; co-occurring stimuli become linked in memory.
- In cognitive tasks, exposure to regular pairings or triplets increases familiarity for those sequences.
- The brain is highly sensitive to patterns and regularities in the environment.
Transitional probabilities and speech segmentation:
- Perception of word boundaries is aided by the likelihood that one sound will follow another within a word.
- Example: in English, the sequence “pretty baby” typically marks a boundary between “pretty” and “baby”; people segment words based on typical transition probabilities.
Bayesian inference and prior knowledge:
- Prior knowledge influences interpretation; Bayesian ideas explain how prior expectations combine with sensory evidence to shape perception.
- Bayesian framework basics (conceptual, not derived in the transcript):
- Prior probability P(H) and likelihood P(D|H) combine to yield posterior P(H|D) ∝ P(D|H) P(H).
- The Likelihood Principle: the data likelihood P(D|H) drives interpretation; priors adjust how likely hypotheses are given the data.
- In perception, priors can bias interpretation toward more probable or familiar categories, affecting even simple perceptual decisions.
Implications for AI and perception:
- Humans are highly tuned to detect patterns and infer meaning from incomplete data, whereas AI systems may rely more rigid feature detection unless explicitly trained for robust inference.

Foundations for modeling object recognition

The goal: understand how the brain recognizes objects given noisy, variable input.
Three key assumptions for object recognition modeling (as described in the lecture):
1) Memory of many stimuli exists in the brain to compare against.
2) We form mental representations (internal models) of objects when we see them.
3) We compare new input to memory/representations to categorize and recognize; if it fits a category, recognition occurs.
An example with Garfield (dog) to illustrate a memory-based matching process.
Introduce four foundational concepts for modeling recognition:
- Likelihood principle (from Helmholtz): infer what is most likely given the input and prior experience.
- Gestalt principles of organization: the whole is greater than the sum of its parts; perceptual grouping cues guide recognition (closure, proximity, similarity, good continuation, figure-ground, common fate).
- Regularities and familiarities: familiar colors, shapes, and contexts bias interpretation toward likely categories (e.g., blue → ocean; green → grass).
- Bayesian inference: integrates prior knowledge with current input to reinforce future associations.
The big model discussed next: template matching (early approach).

Template matching: a simple but limited model of recognition

Core idea: recognition occurs when the current stimulus matches a stored template in memory closely enough.
How it works (conceptual):
- A mental representation/template exists for each object.
- If the incoming stimulus overlaps the template (a good fit), recognition occurs.
- If it does not fit, recognition fails or is ambiguous.
Advantages:
- Simple, intuitive, and easy to apply to straightforward, well-defined inputs.
- Useful as a baseline or starting point for building more complex models.
Major problems and criticisms (as discussed in class):
- No scalability: would require a separate template for every possible transformation of an object (pose, size, orientation, lighting, etc.).
- Real-world variability is enormous: different fonts, handwriting, different image distortions, perspective changes, etc., would require an impractically large template bank.
- Invariance issues: templates are typically tied to exact size/orientation; a mismatch yields failure even when the object is easily recognizable by humans.
- Multiple interpretations: many objects can produce the same or ambiguous inputs; template matching offers no mechanism to derive alternate interpretations.
- Illusions and ambiguous images: certain optical illusions cannot be cleanly explained by a one-template-one-category mapping.
- It tells you only if there is a fit or not; it does not explain how categories are defined or how similarities are measured beyond a threshold.
Takeaway: template matching is a useful but insufficient model for human object recognition; it captures baseline intuition but lacks invariance and flexibility observed in human perception.

Extensions and in-class prompts: improving template matching with perceptual processes

The instructor invited students to propose how to extend template matching to better reflect human perception.
Potential directions (inferred from the discussion and standard cognitive psychology ideas):
- Incorporate invariance to size, rotation, lighting through transformation-tolerant representations (e.g., using features rather than raw templates).
- Add top-down priors to bias template selection depending on context and prior experience (bridging template matching with Bayesian inference).
- Use a hierarchy of templates and components (parts-based recognition) rather than a single full-template approach.
- Integrate contextual cues and Gestalt principles to influence grouping and segmentation before/template matching.
- Allow probabilistic matching rather than a strict yes/no fit, enabling graded recognition strength and multiple possible interpretations.
The discussion ends with a plan to revisit and refine the model in a future session, highlighting that templates can be a starting point but should be expanded to incorporate perceptual processing and sensation-perception dynamics.

Auditory perception: a quick note

Perception is not exclusively visual; auditory perception also relies on top-down processing.
Speech segmentation hinges on transitional probabilities and prior language knowledge to determine where one word ends and the next begins.
- Example concept: as you listen to a continuous stream, you detect word boundaries using probability that one sound will follow another within a word.
Language acquisition can be viewed as statistical learning over time; exposure to structured language enables pattern extraction and segmentation.

Practical implications and takeaways

Perception as a constructive, probabilistic process: we continuously combine sensory input with prior knowledge to form coherent interpretations.
Our perceptual system is robust to incomplete or distorted input, leveraging context, prior experience, and learned regularities.
The gap between human perception and artificial perception: humans excel at inference, generalization, and robust recognition under occlusion, whereas straightforward template-based systems struggle without additional architectural sophistication.
When studying cognition, it is important to consider:
- The interplay between sensation (input) and perception (interpretation).
- The parallel processing of bottom-up and top-down information.
- How memory, learning, and prior knowledge shape current perception (Bayesian thinking, likelihood principles, and probabilistic inference).

Key terms and concepts (glossary)

Sensation: the detection of raw sensory input by sensory organs. ext{Sensation} = ext{input}
Perception: interpretation and labeling of sensory input to create conscious experience. ext{Perception} = ext{interpretation}
Bottom-up processing: perception driven by sensory input from the environment. ext{Bottom-up}
ightarrow ext{perception}
Top-down processing: perception guided by prior knowledge, expectations, and context. ext{Top-down}
ightarrow ext{perception}
Indirect theory / Top-down processing: perception shaped by beliefs and expectations.
Direct theory / Bottom-up processing: perception driven primarily by sensory data.
Gestalt principles: organizing perceptual input into coherent wholes (e.g., closure, proximity, similarity, good continuation, figure-ground, common fate).
Likelihood principle: perception is guided by which interpretation is most likely given the input (inference/likelihood). L(H|D) ext{ is proportional to } P(D|H)
Bayesian inference: combining prior probabilities with current data to form posterior beliefs. P(H|D) = \frac{P(D|H) P(H)}{P(D)}
Transitional probabilities: probabilities of one sound following another within a word; used in speech segmentation.
Statistical learning: learning of regularities and patterns in the environment through exposure; “neurons that fire together wire together.” ext{Hebbian learning: } \ \ \Delta w{ij} = \eta xi y_j
Template matching: a simplistic recognition model where input is matched to stored templates; good fit yields recognition; limited by invariance and scalability issues.
Donders’ subtraction method: a cognitive psychology technique to infer processing time by comparing reaction times across tasks. RT{ ext{complex}} - RT{ ext{simple}} = T_{ ext{processing}}
Hebbian learning: biological rule often summarized as neurons that fire together wire together. Formal representation: \Delta w{ij} = \eta xi y_j.

Quick prompts for review

Differentiate sensation and perception with examples from daily experience.
Explain bottom-up vs top-down processing with a perceptual illusion example.
Describe why five senses are an oversimplification and give two extra senses or perceptual dimensions discussed.
List at least three Gestalt principles and how they aid perceptual organization.
State the main idea of Bayesian inference in perception and provide the basic formula.
Summarize template matching, including its advantages and its key limitations.
Explain how statistical learning contributes to perception in both vision and audition.
Give an example where perception completes an occluded object and explain why this happens.
Propose one way to augment template matching to better reflect human recognition, based on the discussion in the lecture.