9/9 Notes Perception and Cognition in Vision: Key Concepts, Illusions, and Contextual Influences

The Course's Three Pillars and the Transition to Cognition

Three pillars for explaining psychological phenomena: contributions of the body (physiology), the mind (cognition), and the environment.
This first section (neuroscience) has focused on the body’s contributions, especially perception.
Today’s plan:
- Small goal: finish up vision by exploring the role of cognition in vision.
- Big goal: transition from physiology to cognition, and set up tools for understanding cognitive processes (thinking, learning, memory, bias, intelligence).
Big transition: moving from the neuroscience pillar to the cognition pillar; next week begins new units on cognition in general (thinking, learning, memory, decision making, problem solving, biases, intelligence, testing).
Preview of cognition topics: what it means to think, how thinking influences behavior, how information is learned and stored in memory, how we retrieve and use knowledge, how we decide and solve problems, what biases we have, how we think differently across people, and how to define and test intelligence.
Framing question: what does perception look like when cognition and expectations influence what we see? How do mind and knowledge shape sensory experience?

Sensation vs Perception: Two Sides of Sensing a World

Sensation = physiological processes of recording stimuli and retinal images (the bottom-up input).
Perception = mental/cognitive processes that interpret and make sense of those sensory inputs (top-down interpretation).
When you see, sensation provides the data; perception constructs the meaning.
The textbook’s unit title “Sensation and Perception” signals that both physiology and cognition are essential.
Today’s focus: perception, and how thinking shapes interpretation of sensory data.

Recap: The Physiological Basis of Vision (briefly reminded)

Visual system codes for specific features: neurons fire for particular properties (e.g., color, edges, motion).
To see red, neurons tuned to red fire; to see green, different neurons fire in the visual cortex.
Perception builds up from these feature detectors to recognize objects, faces, motions, etc.
The talk uses examples like color (red vs. green), lines, edges, squares, circles, curves, motion, and faces.
The roommate-face example shows that recognition depends not just on the retinal image but also on expectation and context.

How Cognition Shapes Perception: Four Key Mechanisms (Flavor, Not Exhaustive)

Grouping (Gestalt perspective): how the mind groups features into meaningful wholes.
Depth perception: how we infer three-dimensional structure from a two-dimensional retinal image.
Perceptual set: once you have one interpretation, you tend to stick with it.
Context: background, background knowledge, and expectations influence perception.
Note: Gestalt psychology (the Geschtalt/Gestalt) studied how the mind organizes sensory input using “rules” to form perception.

Gestalt Grouping Rules: How the Mind Lets Features 'Belong' Together

Grouping principle: proximity
- Features that are close to one another tend to be grouped as a unit (e.g., pairs forming bars when eight vertical lines are close in pairs).
Grouping principle: similarity (often labeled “civil error” in lectures for color/shape similarity)
- Elements that look alike (same shape or color) are grouped together (e.g., left and right eyes; same color/shape cues pull them into a unit).
Grouping principle: continuity
- The mind prefers smooth, continuous lines; it fills in gaps to see continuous shapes (e.g., interpreting a connected curve as one line rather than many segments).
Grouping principle: connectedness
- Items that are connected by lines or common fate are perceived as part of the same group.
Practical takeaway: these unconscious rules help the brain turn raw retinal data into coherent objects (e.g., faces, objects, scenes).

The Ambiguity Trick: The Necker Cube and Subjective Contours

Necker cube: an ambiguous figure that can be seen in more than one way.
Subjective contours: edges that appear to exist where none are drawn, created by the mind to complete a shape using the rule of continuity.
Demonstration: when you imagine different spatial layouts (cube behind a wall vs. cube in front), subjective contours disappear if the interpretation changes due to context.
Key idea: perception is not a camera readout; the mind fills in gaps and uses context to determine edges and shapes.

Depth Perception: How We Infer 3D Structure From a 2D Image

The paradox: we perceive a three-dimensional world, but the retinal image is two-dimensional.
Mental rules (learned from experience) help us infer depth from a flat image.
Depth cues come in two broad classes:
- Binocular cues (involving both eyes)
- Monocular cues (usable with one eye)
Mental rules show that depth information influences size perception and shape interpretation.

Binocular Depth Cues: Disparity Between the Eyes

Each eye views the world from a slightly different angle; the brain compares the two retinal images.
Differences between the right and left eye images inform depth estimates.
This comparison is central to stereoscopic depth perception and is fundamental to 3D movie technology (two cameras, two images, glasses to deliver separate views).

Monocular Cues to Depth: Depth Without Two Eyes

Monocular cues can reveal depth from a single viewpoint and include both static and moving cues.
Four key monocular cues discussed:
- Size (size comparison): smaller vs. larger objects imply different depths.
- Interposition (occlusion): if one object covers another, the covered object is farther away.
- Height in the plane (elevation): objects lower in the visual field appear closer; higher objects appear farther away.
- Linear perspective: parallel lines appear to converge at a distance; the convergence point cues depth.
Practical takeaway: these cues work together to create a sense of depth from flat images (e.g., the difference in perceived distance in a painting or a photograph).

The Farther-Equals-Larger Rule: How Depth Cues Distort Size Perception

When retinal image size stays the same, depth cues can lead to misperceived size.
Core idea: larger perceived distance can make an object seem larger than it actually is if depth cues imply greater distance.
Demonstrations discussed:
- Ponzo illusion: two equal-sized rectangles appear different in size because of converging lines implying depth.
- Shepard illusion: two identical monsters appear different in size due to depth cues (interposition, perspective, height cues).
Formal intuition from geometry (for z-depth and retinal size):
- Let retinal image size be R for two objects A and B: $RA \,\approx\, \frac{SA}{dA}, \quad RB \,\approx\, \frac{SB}{dB}.$ If $RA = RB,$ then $\frac{SA}{dA} = \frac{SB}{dB} \Rightarrow SA = SB \frac{dA}{dB}.$
In perception, the brain uses heuristics like “farther = larger” to infer real-world size from 2D cues, which can lead to systematic errors in certain configured images (as in Ponzo and Shepard).

The Dual-Layer Illusion: Perceived Size From Depth Cues (Edge Perspective)

Dual-layer illusion: two vertical lines appear different in length due to edge cues that imply perspective.
Fins on edges create depth cues: one edge seems to come toward you, the other seems to recede.
As a result, the “closer” edge can look longer even if physically shorter, because depth cues modulate perceived size.
Core lesson: identical retinal images can produce different perceived sizes due to depth-interpretation rules.

Size-Depth Interplay: The Arrow Example (Size and Distance Interaction)

Thought experiment: an arrow that is farther away may appear larger if it subtends the same retinal angle as a closer, smaller arrow.
This demonstrates that perceived size depends on both the actual size and depth information; if you only know the retinal angle, you cannot determine real size without depth knowledge.
Reinforces the idea that depth cues and size cues are interdependent in perception.

Context and Culture: How Background Shapes What We See

Perception is influenced by prior knowledge, memory, and cultural context.
Elephant example: when forced to count legs quickly, people tend to see either four or five legs depending on prior knowledge.
Culture study: Western participants (built environments, windows) tended to interpret ambiguous pictures as a family inside a room with a window; East African participants (outdoor, baskets on heads) tended to see a family under a tree with a basket on a head.
Conclusion: perception is shaped by experience and culture; the same image can yield different valid interpretations based on context and prior experience.

Perceptual Set: Tuning Perception by Prior Interpretation

Perceptual set: after seeing one interpretation first, people are biased to see the same interpretation in subsequent ambiguous figures.
Experimental setup: two versions of an ambiguous figure (saxophonist vs. woman's face) shown; participants’ initial exposure influenced subsequent perception of the middle image.
Relationship to mental set: perceptual set is a specific instance of the broader concept of mental set, where prior expectations influence future thinking and interpretation.

Ambiguity and Context in Perception: Integrating Multiple Cues

Perception is not a single, fixed readout; it integrates multiple cues (size, depth, contrast, context, prior expectations).
Context can be cultural, environmental, or situational; all contribute to how we interpret sensory data.

Summary: Why These Concepts Matter for Cognition and Behavior

Perception arises from an interaction of sensory input (sensation) and cognitive interpretation (perception).
Our minds actively organize and interpret data using grouping rules, depth cues, and contextual knowledge.
Illusions (Ponzo, Shepard, Necker cube, subjective contours, dual-layer) reveal that perception can diverge from veridical data when our cognitive rules fill in or bias interpretation.
The two-way street between physiology and cognition means understanding perception requires both how the brain processes data and how experience, expectations, and culture frame interpretation.
These ideas lay the groundwork for exploring broader cognitive processes (learning, memory, decision making, biases) in upcoming sections.

Looking Ahead: From Perception to Broad Cognition

The lecture sets the stage for examining: What does it mean to think? How do learning and memory work? How do expectations shape interpretation and behavior?
Next steps will cover how cognition influences behavior, decision making, problem solving, and how biases arise during thinking.
The course will also address how intelligent behavior is defined and tested, and how context and culture influence cognitive processes.

Key Terms and Concepts (Glossary Snippet)

Sensation: physiological encoding of sensory input.
Perception: cognitive interpretation of sensory input.
Proximity: grouping rule (objects close to each other are perceived as a unit).
Similarity: grouping rule (similar features are grouped).
Continuity: grouping rule (perceive continuous lines over abrupt changes).
Connectedness: grouping rule (elements connected are seen as a single unit).
Perceptual set: tendency to perceive in a particular way due to prior interpretation.
Mental set: broader term for cognitive readiness to perceive/think in a certain way.
Context: background information influencing perception.
Binocular cues: depth cues requiring both eyes.
Monocular cues: depth cues usable with one eye.
Retinal image size: the size of an image projected on the retina, related to actual size and distance by roughly $R \,\approx\, \frac{S}{d}$ .
Ponzo illusion: depth cues alter perceived size of equal objects.
Shepard illusion: depth cues alter perceived size of different shapes.
Necker cube: classic ambiguous figure demonstrating perceptual re-interpretation.
Subjective contours: perceived edges not present in the image, created by perceptual inference.
Height in the plane: depth cue based on vertical position in the image.
Linear perspective: depth cue based on converging lines.
Interposition (occlusion): nearer objects block farther objects.
Schematic depth rule: farther objects are often perceived as larger when retinal size is constant.