Lecture Notes on Visual Perception, Visual Search, and Perceptual Biases (Vocabulary)

Cone types and color vision

Humans have three cone types critical for normal color vision
- S-cones peak at 420\,\text{nm} (blue)
- M-cones peak at 534\,\text{nm} (green)
- L-cones peak at 564\,\text{nm} (red)
The three-cone system explains color perception across the visible spectrum; missing a cone type leads to various forms of color blindness (referenced as discussed last week)
Raw spectral sensitivity vs. cone-driven sensitivity:
- The raw sensitivity function peaks at 498\,\text{nm} (dotted line on the plot)
- In practice, any given wavelength excites all three cone types to different extents; for example, light at around 450\,\text{nm} would evoke strong blue cone response, some green, and very little red
Key takeaway for exams (bare bones):
- There are three cone types; normal color vision requires all three
- A given wavelength excites the cones to varying degrees; you don’t need to perform detailed calculations of each cone’s contribution for course purposes
Practical implication: color blindness arises when one cone type is missing or nonfunctional

Visual field, eccentricity, and field extent

Visual field is the spatial extent you can see, described in degrees of visual angle
Eccentricity = how far a location is from the fovea (the point you’re directly looking at)
Example: holding a thumb at arm’s length, a flick over to the left at about 45{-}50^\circ is a typical eccentricity reference
Horizontal field extent varies across people: roughly 180{-}190^\circ (subject to individual differences and glasses/contacts)
Vertical field extent is typically 130{-}140^\circ
Pupillary distance (interpupillary distance) varies widely and influences horizontal field extent:
- Example PDs: 65\,\text{mm} (mid-range) vs 52\,\text{mm} (smaller)
- Closer eyes (smaller PD) can reduce horizontal field; wider separation (larger PD) can increase it
Visual field measurements are size- and distance-independent (degrees of visual angle) rather than literal centimeters at a distance
Horizontal vs vertical field differences and individual anatomy mean there is no single perfect number for everyone

Photoreceptor distribution and density across the retina

The retina contains a central region (fovea) with high cone density; density never drops to zero across the retina
The fovea is the region of the retina with the highest cone density; there are no rods in the fovea
Rod distribution:
- Rod density increases from the fovea and peaks around 20^\circ to 30^\circ away from the fovea
- This supports better scotopic (low-light) vision and peripheral sensitivity
Peripheral photoreceptors and edge density:
- There is a slight increase in photoreceptors at the far edge of the retina in some cases
Important concepts linked to limitations:
- Photoreceptor density is not uniform; this nonuniformity underlies why performance varies across the visual field and drives certain perceptual biases
The optic disc (blind spot): where axons exit the retina; the brain fills in this gap so you don’t notice a blank spot
Practical takeaways for vision tasks: density differences explain why acuity is highest in the center and lower peripherally, and why low-light or motion tasks rely more on rods

Peripheral color, agnosticism to strict template theory, and feature-based processing

Peripheral vision does carry color information; color processing is not limited to the fovea
Template theory vs. feature-based representation:
- Template theory posits that the brain stores a large set of templates for every possible object/version, which is energetically costly and inefficient
- Feature-based (or deconstructive) representation: early visual areas code features (e.g., orientation, color, terminations) rather than whole objects; objects are constructed from features in higher-level cortex
Why the brain favors feature-based representations:
- It is computationally efficient and scalable; supports robust object recognition under occlusion and variation
Adaptation and after-effects:
- Adaptation fatigues neurons representing a stimulus; the brain’s representation shifts, producing aftereffects
- In this course, adaptation effects yield a predictable negative after-effect (opposite direction of the initial percept)
Implications for artificial vision:
- Computer vision systems trained on biased data can inherit human-like biases; this can lead to misidentifications, especially across different racial groups
The “two streams” idea and integration with perception:
- Visual processing involves dorsal (where/how) and ventral (what) streams; they interact rather than being completely separate
Relevance to questions and exams:
- Expect discussion of why near-term visual processing uses features and how adaptation reveals feature-level encoding in early visual areas

Object recognition, agnosias, and the biology of perception

Agnosias: failures to identify objects, with partial or complete forms (e.g., prosopagnosia for faces, or other agnosias)
Genetic vs. acquired causes:
- Agnosias can be caused by brain damage or by genetic variation/malformation; they exist on a spectrum of severity
Object reconstruction in the brain:
- The brain combines early visual features into more complex representations in higher-level cortex
- The basic premise for exams: object reconstruction is generally correct in everyday function, but there are known failures
Caveat and future topic:
- The full, nuanced account is provided by feature integration theory (anticipate coverage in next week’s recorded lecture)
- If curious about failures in feature conjunctions or integration, reach out for more readings
Real-world relevance:
- Case discussions in class illustrate how perception can differ across individuals and contexts
- The potential for bias and misinterpretation in both humans and AI systems is a practical concern in real-world applications

Visual search and attention: pop-out vs conjunction search

Visual search is a classic method to study attention and cognition
Pop-out (feature) search:
- Target defined by a single diagnostic feature (e.g., color)
- Independent of set size: reaction time does not scale with the number of distractors
- Example: find the red item among a set of non-red distractors; set size does not affect search time
- Interpretation: parallel search; the brain can extract the feature across the entire scene without serial inspection
Conjunction search (multiple features):
- Target defined by a conjunction of features (e.g., color and shape)
- Reaction time increases with set size; serial processing is required to combine features
Set size effects and what they imply about representation:
- A flat reaction-time slope with set size in pop-out suggests pre-attentive, feature-level processing across the visual field
- Positive slopes in conjunction search imply serial, attention-demanding processing and integration of multiple features
Practical relevance for driving and real-world tasks:
- Real-world scenes require combining multiple cues (color, orientation, motion) to locate objects of interest
- The efficiency of feature search supports the idea that some information can be processed rapidly without eye movements
A brief digression on task demands and information economy:
- The brain often represents only the information necessary to complete a task (minimize information and energy expenditure)
- Task demands modulate how long you need to look and what information is essential

Perceptual biases, heuristics, and the psychology of everyday vision

Perceptual biases and heuristics are shortcuts the brain uses to cope with a complex world
They are neither inherently good nor bad; they are energy-saving strategies that can be beneficial or lead to errors
Satisficing and inference:
- The brain tends to use the most likely interpretation given prior experience rather than reconstructing the world from first principles
- Unconscious inference (Helmholtz) and the likelihood principle explain why you perceive certain interpretations as more probable
Common perceptual rules and when they fail:
- Proximity, similarity, closure, good continuation, and common fate guide grouping and perception of coherent objects
- Proximity: closer elements tend to be grouped as a unit
- Similarity: elements with shared features (e.g., luminance) group together
- Closure: we perceive complete shapes even when contours are incomplete
- Good continuation: we expect contours to continue smoothly behind occluders
- Common fate: objects moving together are perceived as a group
- Pragnanz (simplicity): the brain prefers the simplest interpretation of a complex scene
Illusions that reveal bias in cue integration:
- Escher-like images (reversing cube, stairs) show how cues can mislead when structures are manipulated
- Light-from-above assumption: we assume light sources come from above; this shapes interpretation of shadows and depth
- Oblique effect: people are more sensitive to horizontal and vertical orientations; perception is less precise for oblique angles
Faces and upright bias:
- We are particularly good at recognizing upright faces; upside-down faces disrupt usual recognition (e.g., Thatcher illusion)
- Thatcher illusion shows how flipping internal facial features disrupts perception when the face is upright, but is less noticeable when inverted
Semantic vs syntactic violations (scene grammar):
- Semantic violations: objects in contexts where they are not plausibly placed (e.g., toilet paper in a dishwasher) – still physically possible
- Syntactic violations: objects in physically implausible positions (e.g., toilet paper floating in midair) – violates physical constraints
- These violations reveal learned scene grammar and expectations about where objects belong
Seed grammar and environment expectations (Melissa Vogue, scene semantics):
- People have lifetime experience with natural scenes; some objects are semantically constrained to certain environments
- Violations reveal the brain’s learned priors about scene structure and object placement

Seed grammar, environment plausibility, and real-world implications

Semantic violations describe objects in inappropriate environments (e.g., a toilet paper roll in a dishwasher) while keeping physical possibility
Syntactic violations describe physically implausible placements (e.g., rolled toilet paper floating in midair)
The point: scene grammar reflects learned expertise about typical environments and how objects should appear
Practical takeaway:
- Our perception relies on long-term experience to infer plausible scene structure; violations help reveal the underlying priors
Everyday examples discussed:
- A water bottle on a professor’s head is physically possible but contextually unlikely; still a plausible localization within a scene, so treated as a semantic violation rather than a syntactic impossibility
The role of context in perception:
- Knowledge of scenes and semantics shapes interpretation even when features are ambiguous
- Cats in unusual places can violate scene grammar more readily than humans; the brain uses prior knowledge to judge plausibility

Real-world and logistical notes pertinent to coursework

Visual Search Lab and relevance to research reports:
- The visual search tasks (e.g., cat among owls) relate to attention and to what you will write about in Research Report 1
Course structure and assessment guidance:
- WPQs (weekly practice questions) are open-book; they are to guide study, not identifiers for exam questions
- Exam questions will focus on lecture content and slides; textbook content unrelated to lectures may not be tested
- Key terms list released after week 6 provides guidance on important material
Study strategy guidance (for students):
- Use lecture content and slides as primary sources for exams
- If textbook material appears, treat it as supplementary context unless explicitly tied to exam content
Discovery Labs and research opportunities:
- Discovery Labs are due by the 25th (approx. one week from the date of the talk); two labs are required for credit
- Labs produce group data used for class-wide analyses; you can revisit labs after the due date for review
Research reports:
- Two lab-style reports; the first due on October 1
- Structure: describe the stimulus/task clearly, report findings, discuss patterns and how group data compare with personal data
- No stats are required; focus on understanding and interpretation
- An opportunity to revise and resubmit the first report in mid-to-late October for a fresh mark
Getting involved in research:
- Begin with lab websites and lab posters to identify interest areas
- When emailing a PI, show you’ve looked at their work and propose a thoughtful discussion rather than a generic request
- Use the Research Opportunity Program (ROP) application window (opens in February) to join labs for summer and subsequent terms
- The department uses SONA for participant recruitment; consider joining studies as a participant to gain firsthand research experience
Practical tips for communications with faculty:
- Avoid casual chatty language; provide specifics about interests aligned with the lab’s work
- Do not propose a project outright; express interest and readiness to engage in ongoing work
Additional notes on course logistics (brief):
- Slides are provided in a single format (pre-lecture and post-lecture updates with WPQ answers)
- The Discovery Labs guide contains red-highlighted critical instructions and troubleshooting tips; read it carefully
- If you miss a break or arrive late, ask during the break to pick up a handout or resource
- The instructor emphasizes a balance between curiosity, rigor, and the practicalities of research engagement

Quick recap of the week’s big ideas

Perceptual biases and heuristics: shortcuts the brain uses to manage the abundance of information
Unconscious inference and the likelihood principle: perceptual priors built from lifetime experience influence what we see
Gestalt grouping rules: proximity, similarity, closure, good continuation, common fate, and the Pragnanz principle guide how we perceive scenes as coherent wholes
Object perception is a two-stage process: early feature representations feed higher-level object representations
Visual search reveals how feature-based and conjunction-based processing work, and how task demands shape information needs
Scene grammar and semantic/syntactic violations reveal learned expectations about where objects belong and how scenes should be structured
The relationship between perception and action: speed-accuracy tradeoffs and the information needed to perform tasks (e.g., driving scenarios)
The importance of recognizing biases in both human and machine vision systems and the social implications (e.g., bias in AI and real-world applications)
Practical coursework and research pathways to deepen understanding through labs, reports, and active engagement with faculty and labs