Lecture Notes on Visual Perception, Visual Search, and Perceptual Biases (Vocabulary)
Cone types and color vision
- Humans have three cone types critical for normal color vision
- S-cones peak at 420\,\text{nm} (blue)
- M-cones peak at 534\,\text{nm} (green)
- L-cones peak at 564\,\text{nm} (red)
- The three-cone system explains color perception across the visible spectrum; missing a cone type leads to various forms of color blindness (referenced as discussed last week)
- Raw spectral sensitivity vs. cone-driven sensitivity:
- The raw sensitivity function peaks at 498\,\text{nm} (dotted line on the plot)
- In practice, any given wavelength excites all three cone types to different extents; for example, light at around 450\,\text{nm} would evoke strong blue cone response, some green, and very little red
- Key takeaway for exams (bare bones):
- There are three cone types; normal color vision requires all three
- A given wavelength excites the cones to varying degrees; you don’t need to perform detailed calculations of each cone’s contribution for course purposes
- Practical implication: color blindness arises when one cone type is missing or nonfunctional
Visual field, eccentricity, and field extent
- Visual field is the spatial extent you can see, described in degrees of visual angle
- Eccentricity = how far a location is from the fovea (the point you’re directly looking at)
- Example: holding a thumb at arm’s length, a flick over to the left at about 45{-}50^\circ is a typical eccentricity reference
- Horizontal field extent varies across people: roughly 180{-}190^\circ (subject to individual differences and glasses/contacts)
- Vertical field extent is typically 130{-}140^\circ
- Pupillary distance (interpupillary distance) varies widely and influences horizontal field extent:
- Example PDs: 65\,\text{mm} (mid-range) vs 52\,\text{mm} (smaller)
- Closer eyes (smaller PD) can reduce horizontal field; wider separation (larger PD) can increase it
- Visual field measurements are size- and distance-independent (degrees of visual angle) rather than literal centimeters at a distance
- Horizontal vs vertical field differences and individual anatomy mean there is no single perfect number for everyone
Photoreceptor distribution and density across the retina
- The retina contains a central region (fovea) with high cone density; density never drops to zero across the retina
- The fovea is the region of the retina with the highest cone density; there are no rods in the fovea
- Rod distribution:
- Rod density increases from the fovea and peaks around 20^\circ to 30^\circ away from the fovea
- This supports better scotopic (low-light) vision and peripheral sensitivity
- Peripheral photoreceptors and edge density:
- There is a slight increase in photoreceptors at the far edge of the retina in some cases
- Important concepts linked to limitations:
- Photoreceptor density is not uniform; this nonuniformity underlies why performance varies across the visual field and drives certain perceptual biases
- The optic disc (blind spot): where axons exit the retina; the brain fills in this gap so you don’t notice a blank spot
- Practical takeaways for vision tasks: density differences explain why acuity is highest in the center and lower peripherally, and why low-light or motion tasks rely more on rods
Peripheral color, agnosticism to strict template theory, and feature-based processing
- Peripheral vision does carry color information; color processing is not limited to the fovea
- Template theory vs. feature-based representation:
- Template theory posits that the brain stores a large set of templates for every possible object/version, which is energetically costly and inefficient
- Feature-based (or deconstructive) representation: early visual areas code features (e.g., orientation, color, terminations) rather than whole objects; objects are constructed from features in higher-level cortex
- Why the brain favors feature-based representations:
- It is computationally efficient and scalable; supports robust object recognition under occlusion and variation
- Adaptation and after-effects:
- Adaptation fatigues neurons representing a stimulus; the brain’s representation shifts, producing aftereffects
- In this course, adaptation effects yield a predictable negative after-effect (opposite direction of the initial percept)
- Implications for artificial vision:
- Computer vision systems trained on biased data can inherit human-like biases; this can lead to misidentifications, especially across different racial groups
- The “two streams” idea and integration with perception:
- Visual processing involves dorsal (where/how) and ventral (what) streams; they interact rather than being completely separate
- Relevance to questions and exams:
- Expect discussion of why near-term visual processing uses features and how adaptation reveals feature-level encoding in early visual areas
Object recognition, agnosias, and the biology of perception
- Agnosias: failures to identify objects, with partial or complete forms (e.g., prosopagnosia for faces, or other agnosias)
- Genetic vs. acquired causes:
- Agnosias can be caused by brain damage or by genetic variation/malformation; they exist on a spectrum of severity
- Object reconstruction in the brain:
- The brain combines early visual features into more complex representations in higher-level cortex
- The basic premise for exams: object reconstruction is generally correct in everyday function, but there are known failures
- Caveat and future topic:
- The full, nuanced account is provided by feature integration theory (anticipate coverage in next week’s recorded lecture)
- If curious about failures in feature conjunctions or integration, reach out for more readings
- Real-world relevance:
- Case discussions in class illustrate how perception can differ across individuals and contexts
- The potential for bias and misinterpretation in both humans and AI systems is a practical concern in real-world applications
Visual search and attention: pop-out vs conjunction search
- Visual search is a classic method to study attention and cognition
- Pop-out (feature) search:
- Target defined by a single diagnostic feature (e.g., color)
- Independent of set size: reaction time does not scale with the number of distractors
- Example: find the red item among a set of non-red distractors; set size does not affect search time
- Interpretation: parallel search; the brain can extract the feature across the entire scene without serial inspection
- Conjunction search (multiple features):
- Target defined by a conjunction of features (e.g., color and shape)
- Reaction time increases with set size; serial processing is required to combine features
- Set size effects and what they imply about representation:
- A flat reaction-time slope with set size in pop-out suggests pre-attentive, feature-level processing across the visual field
- Positive slopes in conjunction search imply serial, attention-demanding processing and integration of multiple features
- Practical relevance for driving and real-world tasks:
- Real-world scenes require combining multiple cues (color, orientation, motion) to locate objects of interest
- The efficiency of feature search supports the idea that some information can be processed rapidly without eye movements
- A brief digression on task demands and information economy:
- The brain often represents only the information necessary to complete a task (minimize information and energy expenditure)
- Task demands modulate how long you need to look and what information is essential
Perceptual biases, heuristics, and the psychology of everyday vision
- Perceptual biases and heuristics are shortcuts the brain uses to cope with a complex world
- They are neither inherently good nor bad; they are energy-saving strategies that can be beneficial or lead to errors
- Satisficing and inference:
- The brain tends to use the most likely interpretation given prior experience rather than reconstructing the world from first principles
- Unconscious inference (Helmholtz) and the likelihood principle explain why you perceive certain interpretations as more probable
- Common perceptual rules and when they fail:
- Proximity, similarity, closure, good continuation, and common fate guide grouping and perception of coherent objects
- Proximity: closer elements tend to be grouped as a unit
- Similarity: elements with shared features (e.g., luminance) group together
- Closure: we perceive complete shapes even when contours are incomplete
- Good continuation: we expect contours to continue smoothly behind occluders
- Common fate: objects moving together are perceived as a group
- Pragnanz (simplicity): the brain prefers the simplest interpretation of a complex scene
- Illusions that reveal bias in cue integration:
- Escher-like images (reversing cube, stairs) show how cues can mislead when structures are manipulated
- Light-from-above assumption: we assume light sources come from above; this shapes interpretation of shadows and depth
- Oblique effect: people are more sensitive to horizontal and vertical orientations; perception is less precise for oblique angles
- Faces and upright bias:
- We are particularly good at recognizing upright faces; upside-down faces disrupt usual recognition (e.g., Thatcher illusion)
- Thatcher illusion shows how flipping internal facial features disrupts perception when the face is upright, but is less noticeable when inverted
- Semantic vs syntactic violations (scene grammar):
- Semantic violations: objects in contexts where they are not plausibly placed (e.g., toilet paper in a dishwasher) – still physically possible
- Syntactic violations: objects in physically implausible positions (e.g., toilet paper floating in midair) – violates physical constraints
- These violations reveal learned scene grammar and expectations about where objects belong
- Seed grammar and environment expectations (Melissa Vogue, scene semantics):
- People have lifetime experience with natural scenes; some objects are semantically constrained to certain environments
- Violations reveal the brain’s learned priors about scene structure and object placement
Seed grammar, environment plausibility, and real-world implications
- Semantic violations describe objects in inappropriate environments (e.g., a toilet paper roll in a dishwasher) while keeping physical possibility
- Syntactic violations describe physically implausible placements (e.g., rolled toilet paper floating in midair)
- The point: scene grammar reflects learned expertise about typical environments and how objects should appear
- Practical takeaway:
- Our perception relies on long-term experience to infer plausible scene structure; violations help reveal the underlying priors
- Everyday examples discussed:
- A water bottle on a professor’s head is physically possible but contextually unlikely; still a plausible localization within a scene, so treated as a semantic violation rather than a syntactic impossibility
- The role of context in perception:
- Knowledge of scenes and semantics shapes interpretation even when features are ambiguous
- Cats in unusual places can violate scene grammar more readily than humans; the brain uses prior knowledge to judge plausibility
Real-world and logistical notes pertinent to coursework
- Visual Search Lab and relevance to research reports:
- The visual search tasks (e.g., cat among owls) relate to attention and to what you will write about in Research Report 1
- Course structure and assessment guidance:
- WPQs (weekly practice questions) are open-book; they are to guide study, not identifiers for exam questions
- Exam questions will focus on lecture content and slides; textbook content unrelated to lectures may not be tested
- Key terms list released after week 6 provides guidance on important material
- Study strategy guidance (for students):
- Use lecture content and slides as primary sources for exams
- If textbook material appears, treat it as supplementary context unless explicitly tied to exam content
- Discovery Labs and research opportunities:
- Discovery Labs are due by the 25th (approx. one week from the date of the talk); two labs are required for credit
- Labs produce group data used for class-wide analyses; you can revisit labs after the due date for review
- Research reports:
- Two lab-style reports; the first due on October 1
- Structure: describe the stimulus/task clearly, report findings, discuss patterns and how group data compare with personal data
- No stats are required; focus on understanding and interpretation
- An opportunity to revise and resubmit the first report in mid-to-late October for a fresh mark
- Getting involved in research:
- Begin with lab websites and lab posters to identify interest areas
- When emailing a PI, show you’ve looked at their work and propose a thoughtful discussion rather than a generic request
- Use the Research Opportunity Program (ROP) application window (opens in February) to join labs for summer and subsequent terms
- The department uses SONA for participant recruitment; consider joining studies as a participant to gain firsthand research experience
- Practical tips for communications with faculty:
- Avoid casual chatty language; provide specifics about interests aligned with the lab’s work
- Do not propose a project outright; express interest and readiness to engage in ongoing work
- Additional notes on course logistics (brief):
- Slides are provided in a single format (pre-lecture and post-lecture updates with WPQ answers)
- The Discovery Labs guide contains red-highlighted critical instructions and troubleshooting tips; read it carefully
- If you miss a break or arrive late, ask during the break to pick up a handout or resource
- The instructor emphasizes a balance between curiosity, rigor, and the practicalities of research engagement
Quick recap of the week’s big ideas
- Perceptual biases and heuristics: shortcuts the brain uses to manage the abundance of information
- Unconscious inference and the likelihood principle: perceptual priors built from lifetime experience influence what we see
- Gestalt grouping rules: proximity, similarity, closure, good continuation, common fate, and the Pragnanz principle guide how we perceive scenes as coherent wholes
- Object perception is a two-stage process: early feature representations feed higher-level object representations
- Visual search reveals how feature-based and conjunction-based processing work, and how task demands shape information needs
- Scene grammar and semantic/syntactic violations reveal learned expectations about where objects belong and how scenes should be structured
- The relationship between perception and action: speed-accuracy tradeoffs and the information needed to perform tasks (e.g., driving scenarios)
- The importance of recognizing biases in both human and machine vision systems and the social implications (e.g., bias in AI and real-world applications)
- Practical coursework and research pathways to deepen understanding through labs, reports, and active engagement with faculty and labs