Class Notes

1/22/26

World → distal stimulus

Retinal Image → proximal stimulus

What is the function of vision? Is to establish internal representations of the external world such that we can successfully interact with it (the world).

The function of any sensory system is to establish internal representations of the external world, based on some physical source of information that reflects some aspects of the world so that we can successfully interact with it.

Psychometric function is different from person to person (threshold)

1/29/26

We tend to use rods for night vision.

The rod system has higher sensitivity

  • Works better in low light

Acuity → Ability to discriminate fine detail

  • Ability to distinguish fine detail, depending on how small it is

In bright light, cones are more sensitive than rods

  • Cones adapt pretty quickly and then bottom out

Oguchi’s disease (congenital) → No (functional) rods

Receptive Field → a property of the cell, but it is defined by location within the visual field

  • Every visual sensory neuron has a receptive field

  • part of the visual field

  • That part of the visual field to which a given neuron is sensitive

High acuity; low sensitivity

  • Small receptive fields so there is no ambiguity with regard to A alone, B alone, or A plus B (high acuity)

  • But any given ganglion cell is likely to get activated under low-light conditions (low sensitivity)

Ganglion cells are good at receiving edges

RGCs with center-surround RFs are edge detectors

02/03/26

Neurofibers leave the back of the eye → that is where the blind spot is

The primary visual pathway (geniculostriate) and secondary visual pathway (retinotectal) both start at the retina

PVP

  • Evolutionarily newer

  • underlies conscious perception

SVP

  • Evolutionarily older

  • mainly unconscious processing

Primary visual cortex (V1)

  • Retinotopic representation

    • The spatial relations are maintained. Establishing a spatial map of the cortex

    • Better defined in the earlier processing fields than the later processing fields.

  • Cortical magnification

    • The cortical map is the tissue dedicated to the region

  • Multiple maps & Increasing receptive field size

    • Taking the cortex and making it flat.

    • Each one of the V’s is a separate map of the visual field. (Multiple maps of the visual field slide)

    • The receptive field size gets larger as it goes on up from V1.

  • Functional selectivity

    • When neurons respond more strongly to some visual feature or property than to others, that cell effectively codes for the presence of that visual attribute at that particular location in the visual field

      • Fusiform Face Area (FFA) → is activated during face perception because it contains lots of individual neurons that are selective for configurations of stimuli corresponding to faces ( like RGCs are selective for edges)

    • (Rough) Functional selectivity

      • V1 → basic features

      • V4 → color, curvatures, and simple shapes

      • IT/TE and LOC → complex form

      • FEF/LIP → spatial attention, saccade control

  • V1 selectivity

    • Retinal ganglion cells are selective for edges

    • Firing rate changes when there is an edge in their RF, and does not change when there is not

    • The tuning curve describes selectivity for a given neuron

    • The tuning function describes the characteristics of that one cell

    • V1 cells have orientation selectivity

Information processing through selective convergence

  • A specific subset of RGCs converging onto a common V1 cell can create an orientation-specific V1 cell

Selective adaptation

  • Increased threshold for adapted orientation and NOT other orientations

  • Means there are mechanisms selective for that orientation

Humans adapt selectively to specific spatial frequencies as well as specific orientations.

Human Contrast Sensitivity Function (CSF)

  • The yellow region shows the part of that space where we can see edges and light information over space.

  • It is determined by underlying (measurable) spatial-frequency channels → these are essentially filters.

  • It tells us about

    • Acuity → the smallest spatial detail that can be resolved (depends on contrast)

    • Sensitivity → the lowest contrast that can be perceived (depends on spatial frequency)

2/5/26

Human Contrast Sensitivity Function (CSF) → a space of stimulation that the visual system can deal with.

  • The shape of it is telling the lowest amount depends on the frequency, and the acuity depends on the contrast

  • Can be defined without reference to each other

Spatial Frequency and Orientation

  • defined by orientation, spatial frequency, and contrast

Square-wave gratings → set of superimposed spatial frequencies at increasing spatial frequency and decreasing contrast

  • More complex than sine waves

  • To the visual system, it is very complex

Single sine waves are filtered by a single frequency channel

Fourier analysis → can decompose any 2D image into a sum of component sine waves (spatial frequency, contrast, orientation)

Bandpass-filters → only a range of spatial frequencies is passed through

  • Internal representations of the retinal image (related to the external world) in V1 is the pattern of activity across spatial-frequency and orientation

High contrast and low contrast produce the same image, but produce different spikes/sec

  • A low-contrast edge at the preferred orientation elicited the same response as a high-contrast edge at a less-preferred orientation. [Ambiguity in single-cell activity]

  • Edges at very different non-preferred orientations can produce the same response. [Ambiguity in single-cell activity]

Population Coding → representation in terms of patterns of activity across multiple cells (populations) with different selectivities reduces ambiguity

  • High contrast and low contrast can depend on how the cell responds to the stimuli

2/10/26

V2

  • Each visual field is retinotopically mapped

  • They are coding for different types of visual attributes

  • Needs edge information in the receptive field

V4

  • Individual V4 neurons are selective for curvature

  • If it gets curvature at the right amount, it will change its firing rate to the highest amount on the curve

  • Inner part is the on part and the outer part is the off part

  • it is receiving connections from a set of V1 cells (neural signals in image (purple bells)).

  • they connect within the area to other cells and begin to form other shapes

  • shape defined by population coding across curvature-selective cells

    • can help see more complex shapes

  • shapes are made up by a bunch of curves (A,b,c,d,e,f in Hershey kiss looking image)

IT/ITE

  • Faces

  • Complex-form selectivity

  • far in the visual processing hierarchy

  • Kobatake and Tanaka (1994) monkey

  • coding for configural representations

  • they can show each of the three features but misconfigured

  • found to code for complex things

  • population coding

MT/MST

  • motion selectivity

  • MT is the critical one

  • MT is coding for motion without the need for edges

  • “Pure” motion, no orientation edge needed

  • displays the motion without the need for an edge in its receptive field

  • MT/MT+

    • compared when having a moving stimulus, no edge, vs a stationary stimulus

Frontal Eye Fields (FEF) and Lateral Intraparietal Area (LIP) → Visually Guided Eye Movements and Attention

Functionally Distinct Pathways

  • Dorsal stream → V1 to parietal cortex

    • Extension of rod system (and magnocellular pathway of LGN)

  • Ventral stream → V1 to temporal cortex

    • Extension of rod system (and parvocellular pathway of the LGN)

Early What vs Where Evidence

  • Ungerleider & Mishkin (1982)

  • Double dissociation → one of two functions is damaged without harm to the other, and vice versa

  • Object discrimination → “what” task

  • Landmark discrimination → “where” task

  • Monkey with the ventral lesion was impaired with the what task but not impaired with the where task

  • Monkey with the dorsal lesion was impaired with the where task but not impaired with the what task

Later What vs How evidence

Milner & Goodale (1991)

  • Two tasks

    • Perceptual matching (What)

    • Posting (How)

  • Two patients

    • Ventral Damage (DF)

    • Dorsal Damage (RV)

  • Two Tasks

    • Reaching (How)

    • Same/different (What)

  • Same/Different Task (What)

    • DF was worse than RV who was equal to control

  • Reaching Task (How)

    • RV was worse than DF who was equal to control

    • The patient with dorsal damage (RV) performed poorly

    • The patient with ventral damage (DF) performs as well as controls

Specific Selectivities Reflect Ventral/Dorsal Functions

  • Retina → edges (cones, rods)

  • LGN (parvocellular, magnocellular)

  • V1 (parvo, magno) → basic visual features

  • V2 (parvo, magno) → basic visual features in context

  • V4 → color, curvature

  • Temporal areas (IT/TE) → complex shapes, faces/configurations

  • Ventral stream → What (object recognition)

  • MT → motion

  • LIP/FEF → eye movements. Visually guided grasping

  • Dorsal Stream → How? (visually guided action)

2/17/26

Object Perception

  • Perceptual Organization → Processes by which representations of image-based information (proximal stimulus) are transformed into representations that reflect scene structure (distal stimulus)

    • proximity, similarity, enclosure, symmetry, closure, continuity, connection, figure & ground

    • Components of Perceptual Organization:

      • Represent edges (image information)

      • Represent uniform regions bounded by edges (image information)

        • Different luminance levels reach the eye from different parts of the scene because of light reflecting off of surfaces with different reflectances

        • Mosaic → still an image-level representation

      • Border ownership/ figure vs. ground/ relative depths (beyond the image)

        • Edges separate regions

        • Not explicit, it has to be inferred

        • V2 has cells that are selective for specific border ownerships1

        • Function of selectivity for border ownership

        • Distinguish figure from (back)ground/ Assign relative depth

      • Completion- representing “inferred” parts of the scene (beyond the image)

        • group similar lines together

        • continue aligned edges (even if dissimilar)

        • enclose edges to define contiguous regions

        • relatable edges are completed; unrelatable edges are not

        • X-junction → transparency and different depths

        • T-junction → occlusion and different depths

        • L-junction → adjacent at same depth

  • Object Recognition → processes by which visual representations are matched to amodal semantic representations in memory

  • Scene Processing → understanding objects and their relations in context

  • Interdependence of components → completion depends on assigned border ownership

  • [Input] Image-based representations (e.g., luminance over space) → Perceptual Organization → [Output] Scene-based representations (e.g., surfaces and their spatial relations)

2/19/26

Recall:

  • Perceptual Organization → Processes by which representations of image-based information (proximal stimulus) are transformed into representations that reflect scene structure (distal stimulus)

  • Object Recognition → processes by which visual representations are matched to amodal semantic representations in memory

    • Theories of object recognition try to explain how that matching process occurs

    • Template theories are intuitive

    • Template Models

      • Point-for-point matching of input against stored representation (“lock and key”)

Problems with this simple template theory? → We would need an infinite number of templates in memory to account for human object recognition capabilities.

  • The problem of Invariance

    • A successful object recognition system must be able to recognize on object across different points of views (and other variability in context)

    • Template-Matching processes are used for a lot of applications where viewpoint can be controlled, and the number of to-be-identified “objects” is limited.

    • Computer-vision systems (self-driving cars) are increasingly sophisticated template-matching systems…but still template matching.

    • Template models with extensive image-normalization processes and exposure to massive image sets (for defining the templates) are working for increasingly complex and dynamic applications

    • Point-for-point matching works (logically) because the visual representation and the memorial representation are the same format and can be compared…point-for-point

    • Characteristics of Human Vision that are not Well Explained by Template Theories

      • Viewpoint invariance

      • Robust against image degradation

      • Memory representations cannot depend on sensory modality (templates do)

      • Recognition is vast and fast

      • Incredibly reliable…we don’t do this (as much)

  • Scene Processing → understanding objects and their relations in context

  • [Input] Image-based representations (e.g., luminance over space) → Perceptual Organization → [Output] Scene-based representations (e.g., surfaces and their spatial relations)

  • Components of Perceptual Organization:

    • Represent edges (image information)

    • Represent uniform regions bounded by edges (image information)

    • Border ownership/ figure vs. ground/ relative depths (beyond the image)

    • Completion- representing “inferred” parts of the scene (beyond the image)

Ambiguity and Best Guesses about Organization

  • It is more likely that two lines cross than that two angles happen to abut, but it’s not impossible that two angles abut.

  • So, perceptual inference based on likelihood is separate from cognitive inference.

Structural Description Models (alternative to template models)

  • Object representations are descriptions in terms of the nature of constituent parts and the spatial relations between those parts

  • Hands are represented as a specific set of parts and their spatial relations

  • Structural Description Models do three important things:

    1. Provide efficiency of representation (like alphabet to words) allowing us to represent many distinct objects

    2. Solve the comparison of representation problem (apples-to-apples instead of apples-to-oranges)

    3. Solve the problem of viewpoint invariance by defining parts on the basis of viewpoint invariant properties (this needs more explanation)

Parsing Image into Parts

  • The structural description process has to unfold based on image formation

  • It cannot depend on knowing what the object parts are (4 fingers and a thumb)

  • This visual system uses matched concavities in the image to parse it (break it apart) and represent it as a set of component parts

    • Notice that parsing is perceptual organization

  • Why concavities?

    • When multiple 3D components join together, they often create concave boundaries in their 2D projection

    • Concavities in 2D images are therefore useful cues to 3D part boundaries

  • Parsing at concavities…Image-based process → does not depend on knowing what the parts are → allows us to establish structural descriptions of novel objects

Identifying the Parts

  • A relatively small set of parts provides efficiency of representation and recognition. Like letters in an alphabet.

    • 26 letters

    • More than 1,000,000 words

    • Infinite number of sentences

  • What are the parts?

    • Recognition by Components (RBC). A specific structural description model

    • Parts are represented based on viewpoint invariant properties

    • Visual properties that remain constant in the 2D retinal image across (most) viewpoints of the 3D object

    • A solution to the challenge of viewpoint invariance in object recognition

  • 3D curvature projects 2D curvature (except for a single accidental point of view). 3D straight projects 2D straight

    • So if the image is curved, the visual system infers that the object is curved. If the image is straight, the visual system infers that the object is straight

  • Cotermination (a viewpoint invariant property) on the image is perceived as co-termination in the world (even though sometimes it’s not)

    • The default assumption is that things are being viewed from a non-accidental viewpoint

2/24/26

Recognition by Components (RBC) → A specific structural description model

Objects are represented as sets of parts and their spatial relations

  • Addresses the perceptual ←→ memorial representation problem

Parts are defined based on viewpoint invariant properties

  • Addresses the challenge of viewpoint invariance

Each of these are examples of different parts that can be used to create different representations.

These different geons are reliably distinguishable from each other from different points of view.

Object representations consist of combinations of geons with specific spatial relations. (like words are combinations of letters in specific orders).

Cup → parts: {5, 3}. spatial relations: 5 is on the side of 3

Bucket → parts: {5, 3}. spatial relations: 5 is on top of 3

Structural descriptions (list of parts and their spatial relations) serves as a common representational format for perception and memory

apples to apples

Structural Description Models

do three important things

  1. Provides efficiency of representation allowing us to represent many distinct objects (like alphabet to words)

  2. Solve the comparison of representation problem (apples-to-apples instead of apples-to-oranges)

  3. Solve the problem of viewpoint invariance by defining parts on the basis of viewpoint invariant properties (e.g., in RBC)

Facial Recognition

Visual Agnosia → can’t identify non-face objects but can recognize faces

Prosopagnosia → can identify non-face objects but can’t recognize faces

Structural descriptions of faces do not help identify between individuals

Thatcher Effect → Why does the altered one look so much weirder when it is right-side up?

Faces are processed (more) holistically (than non-face objects).

“Holistically” means that recognition depends more on the representation of relations between parts or configurations than on parts.

Whole-object advantage for detecting difference (Tanaka & Farah 1993)

  • only holds for faces. houses won't engage normal face-recognition processes, but faces will.

Upside-down faces should not engage normal face-recognition processes to the same extent that upright faces do

  • The whole-object advantage occurs only for upright faces - upside-down faces are not faces to the visual system

Inversion Effect → Evidence that faces are processed differently

Face recognition is impaired more by inversion than non-face object recognition is impaired.

Fusiform Face Area Parahippocampal Place Area

PPA → area that is selectively activated by images of places

FFA → area that is selectively activated by images of faces

Both areas are in the temporal cortex (ventral stream)

These areas reflect different types of processing (part-based versus holistic) rather than different categorical functionality (places versus faces)

The processing of upside-down faces is dominated by part-based processing

  • An inverted face is not treated by the visual system as a “face”… so “regular object” part-based processes dominate …each part is essentially fine here

The processing of right-side up faces is dominated by holistic processing

  • A right-side up face is processed as a face …so holistic processes dominate … the relations between parts are incongruous.

The “face” is detected when it is at the orientation of normal faces

  1. Alignment supports the perceptual completion of the two halves into a single object. The single object is a face and therefore processed holistically …making the individual component faces difficult to represented separately. For this (albeit weird) task, holistic processing is a problem.

  2. The difference in difficulty between aligned and misaligned stimuli should be significantly reduced because turning them upside down makes them less likely to engage holistic processing and it is holistic processing that is causing the greater difficulty for aligned faces

Are faces special?

Sort of. Experts tend to showed increased FFA activity when looking at examples of the category for which they are expert. (Fusiform Expertise Area)

Perceptual expertise often involves shifting from more part-based processing to holistic processing

Own-Race Effect is Real

It’s about experience

The inversion effect is greater for own-race faces than other-race faces

Differential experience leads to differential engagement of holistic processing (expertise)

Scene processing → understanding objects and their relations in context

We extract the “gist” of scenes extremely quickly (the “clap when you see water” example)

We do this by using global image (proximal stimulus) properties to coarsely categorize scenes (representation of distal stimulus) of different types.

All of this involves inference.

Depth and Size

The function of vision → is to establish internal representations of the external world, such that we can successfully interact with it.

Retinal images are ambiguous with regard to size and depth

Retinal images are measured in terms visual angle.

The same object projects a smaller retinal image at further distances.

Notice that retinal images are ambiguous with regard to shape too because of orientation in depth.

2/26/26

Perceiving size

Oculomotor cues

  • Accomodation

    • The depth cue is that the brain can register the state of the muscles that control lens thickness.

  • Convergence

    • The depth cue is that the brain can register the state of the muscles that control the angle of the eyes.

Cues based on retinal imagae aka stereovision.

Monocular vision, you don’t need information from both eyes. One is sufficient enough.

Static cues:

  • Position-Based Cues

    • Partial occlusion:

      • When one thing is infront of another and blocks the object behind it (occlusion) it sends a cue that the thing that is occluded is farther away than the thing that is not occluded.

    • Relative height:

      • Natural images have multiple cues that the system must integrate in some way.

      • Relative height in image. Depth information.

  • Size-Based Cues

    • Relative size

    • Familiar size

You are familiar with the size of the coins so you know their sizes are different but in this image they are the same size

Texture gradients

Linear perspective

  • Lighting-Based Cues

    • Atmospheric perspective

    • Shading

      • You can convince yourself of a different lighting direction, and it will change the depth/shape perception

    • Cast Shadows

      • Many aspects of cast shadows carry information about depth

  • Dynamic Cues

    • Motion parallax

      • The magnitude of speed difference between two objects is metrically related to the distance between them.

    • Optic flow

      • Is the change in the optic array over time (We use it a lot for guiding our action)

      • The dynamic (changing) optic array

      • Image that is projected to the retina

      • Optic flow is separate from object perception

    • Deletion and accretion

  • Cues can work together motion and cast shadows

  • Binocular cues

    • Binocular disparity

      • Corresponding points are defined relative to the fovea

      • Horopter: The set of locations in the world that project to the corresponding points. It defines a surface of zero disparity.

    • Only exists in the relationship between the images in the two eyes

    • Direcetion of disparity indicates direction of from the horopter

    • Uncrossed disparity: perceived as farther than horopter

      • Images move away from the fovea nasally (toward the nose)

      • You would have to uncross (diverge) your eyes to fixate the object

      • Crossed disparity: perceived as closer than the horopter

        • Images move away from the fovea temporally (toward the ear)

        • You would have to cross (converge) your eyes to fixate the object

  • Binocular disparity

    • The difference in the relative position of the image of a single object (or edge) on the two retinae

  • Stereopsis

    • Perceiving depth from binocular disparity

    • About 7% of the population are stereoblind

3/3/26

Binocular cue

  • Only exists in the relationship between the images in the two eyes

Binocular disparity

  • The difference in the relative position of the image of a single object (or edge) on the two retinae

Stereopsis

  • Perceiving depth from binocular disparity

Corresponding points are defined relative to the fovea

Horopter

  • The set of locations in the world that project to corresponding points on the two retinae. It defines a surface of zero disparity.

Direction of disparity indicates direction from the horopter.

Uncrossed disparity

  • Perceived as farther than horopter

Crossed disparity

  • Perceived as closer than the horopter

Binocular Disparity → Neurophysiology

The Correspondence Problem → How does the visual system “know” which image in the right eye corresponds to which item in the left eye?

  • Feature (color, shape, and image size) matches?

  • Image features are definitely used, but cannot be the whole story

The Wallpaper illusion (“magic eye”)

  • Occurs when the correspondence problem is solved “incorrectly”.

Inference-like process: In order for an object to project the same size image as another object from a greater distance, it must be a larger object.

Selective (Visual) Attention

  • Processing one source of visual information while ignoring others.

  • Compare identical stimulus conditions under different task conditions.

Visual attention is selective visual processing

  • Selection in space and time

  • What happens to selected information?

  • Neural basis of selective processing

  • Scene perception and the fate of the unattended

3/5/26

Attention can be captured

Initial eye-movements often go to the non-target additional singleton, but capture depends on control settings. If searching for a singleton is not the optimal strategy, singletons don’t capture attention.

Attentional guidance is determined by more than bottom-up (stimulus driven) and top-down (goal driven) factors.

Humans are social animals. Attention is guided by others’ (overt) attention.

Attentional guidance is understood in terms of an internal map that prioritizes locations for selection based on multiple sources of input.

  • Priority map integrates information based on salience (bottom-up), task relevance (top-down), and other attributes (e.g., search history and value).

  • Then the attention is guided to peaks in order of activation level (highest to lowest)

Selection in Time → Metaphor for understanding limitations of temporal selection.

  • Experience can improve temporal selection

  • Emotional stimuli (especially negative ones) capture our attention and induce an attentional blink

Enhanced activity (basically gain control)

  • Retinotopically organized enhanced activity in V1 corresponding to cued locations.

  • Cells with receptive fields at cued locations respond more strongly than cells with receptive fields at uncued locations.

  • Identical input yet different neural response under different cueing conditions.

You can see retinotopic response changes in visual cortex (V1) (attending to spatial locations).

Enhanced activity of specific types of processing

  • Recall that functional selectivity is an attribute of cortical processing.

  • Functionally-specific (objects not locations) changes in visual cortex.

Biased Competition → A theory of selection at a neural level

  • Stimuli compete for (neural) representation, and attention biases that competition in favor of one thing or another.

  • Attention changes (biases) population activity. MT/MST (human fMRI)

3/10/26

We have mechanisms to support selective processing because the processing capacity of the visual and cognitive system is limited.

Inattentional Blindness

  • Knowing that we are susceptible to missing things doesn’t prevent us from missing things.

    • This is because missing things is a consequence of selective processing… and selective processing is a necessary state of the system given limited processing capacity.

Global Image Information

  • Axes are defined by image features

  • Spatial frequency (openness) and edge orientation (expansion)

  • Scenes cluster by semantic type

Ensemble Perception (perceiving summary image-statistics)

  • We establish examples of summary statistics from natural images that people can reliably report.

  • Gaze direction, family resemblance, size, orientation, hue,

  • motion direction and speed, heading direction, face expression

Visual Attention

  • Visual processing has to be selective because processing capacity is limited

  • Attentional guidance is influenced by both bottom-up (stimulus-driven) and top-down (goal-based) factors, as well as by aspects of an individual’s own selection history → we understand guidance through the construct of a priority map.

  • Different lab-based tasks that are used to study selective processing (e.g., cueing, RSVP, visual search) have revealed what selected information is processed differently from unselected information

  • The fate of unattended stimuli is significant… we miss more information than we think we do

  • There are many ways in which selective processing is embodied at a neural level

  • Some aspects of scene perception are “unselective” and contribute to guidance of selective processing

Part 3

3/24/26

The study of color vision is a microcosm of vision science

The function of vision

  • To establish internal representations of the external world, such that we can successfully interact with it.

Reflectance and spectral reflectance

Lightness = perceived reflectance (psychological) → shades of gray

  • Is a (perceptual) conclusion

Color = perceived spectral reflectance

Surface Reflectance = the proportion of light that a surface reflects (physical)

The inverse optics problem for lightness

Luminance → intensity of incident light (physical) → intensity of light at the eye - retinal input (physical)

Reflectance → proportion of incident light that a surface reflects (physical)

It depends on the luminance at the eye and the perceived luminance of incident light

Simultaneous Contrast

Lateral inhibition explanation of simultaneous contrast: Higher luminance surround suppresses more than lower luminance surround

What matters is to which regions/surfaces the gray squares seem to belong - lightness depends on perceptual organization

More perceptual organization

  • Edge types are important cues about the illumination conditions

  • Different edges come from different things

Scene cues about edge type provide information about the illumination conditions

  • Lightness difference is stronger when the edge is perceived as an illumination edge than when it is perceived as a reflectance edge

  • The lightness difference increases as the cues about edge type lean increasingly toward them being illumination edges (rather than reflectance edges)

Inference about the structure of the scene

Color → perceived spectral reflectance

  • The visual system’s conclusion as to what proportion of light a surface reflects as a function of wavelength

The nature of specific light (from a source or at the eye) can be described in terms of its power spectrum intensity as a function of wavelength

White light (such as from the sun) is light that contains all wavelengths in more-or-less equal proportions (Flat power spectrum)

  • Flat Power Spectrum → describing white light

Measurements of intensity of specific wavelengths produced by different light sources (their power spectra)

Surfaces have different reflectance profiles

  • The proportion of light that a surface reflects as a function of wavelength

  • Spectral reflectance

When reflectance profiles are flat, we talk about lightness (perceived reflectance because it’s constant across wavelength)

When reflectance profiles are not flat, we talk about color (perceived spectral reflectance)

(Light at source): Power spectrum → intensity of incident light as a function of wavelength (physical)

(Surface reflectance): Spectral reflectance → proportion of incident light that a surface reflects as a function of wavelength (physical)

(Light at eye): Power spectrum → intensity of light at the eye as a function of wavelength at the eye - retinal input (physical)

Changes to the illuminant vs changes to the (surface) reflectance

Additive color mixing

  • mixing lights

  • changes the power

  • spectrum of the light

Subtractive color mixing

  • mixing pigments

  • changes the spectral

  • reflectance of the surface

Additive color mixing (lights) Red + Green = Yellow

  • Pigment absorbs more of the wavelengths, subtracting from the signal that reaches the eye

  • More wavelengths are added to the signal that reaches the eye

Additive - it relies on mixing of different wavelengths as they reflect off of different points of pigment

3/26/26

Light → Power Spectrum (Spectral Power Function)

Surface → Spectral Reflectance Function

Light (with a given power spectrum) shines on surfaces (with a given spectral reflectance function) → Describes the light (power spectrum) that reaches the eye.

First Step to Perceiving Color

  • Encoding the wavelength information at the eye

Spectral content of light & Spectral reflectance of surfaces both determine the light at the eye

Law of Three Primaries (psychophysics)

  • Given control over the intensities of three different primary light sources, any visible spectral color can be matched  

  • Led to the hypothesis of 3 classes of photoreceptors, each with a different peak sensitivity.

Trichromacy (physiology)

  • Population coding → the pattern of activity across a population of cells.

Still ambiguity

  • Imagine a system with two classes of receptors with different spectral sensitivities

  • This system cannot tell the difference

Metamers

  • Pairs of stimuli that are perceptually identical but are physically different (have different power spectra)

    • 530 + 680 light does not create 580 light

    • This system just can’t represent the difference between 530 + 680 and 580

The set of discriminable colors is determined by the number of cone types and their specific spectral sensitivities

Color blindness is an inability to discriminate between colors that “normal” (trichromatic) folks can because of fewer distinct cone classes

  • Color blind individuals simply have more metamers

Red-green colorblindness is often caused by the M and L cones having peak sensitivities that are too close

  • Blue-yellow colorblindness is much less prevalent

Mantis shrimp have 16 classes of photoreceptors!

Wavelength is physical

Color is psychological

  • Does not exist in the external world

  • Is the interaction between wavelength and our particular visual system

Color Space (versus spectrum)

  • electromagnetic spectrum is linear

  • 380 vs 780 nm are maximally different stimuli

Color space is circular

Color space is a perceptual (physiological) space, not a physical space

The color spindle - three dimensions

The Law of Complementarity (psychophysics)

  • For any spectral color there is a complementary spectral color such that when the two are combined, the result is white/gray

  • Worked out by Hering around the same time of the Law of Three Primaries

  • Hering hypothesized 3 classes of photoreceptor, each with an opponency relationship

Trichromacy (neurophysiology) → photoreceptors

  • three cone types were confirmed definitively by the 1960s

  • later (1970s or so), color-opponent cells were discovered

Opponency → ganglion cells and beyond

Trichromacy at the level of photoreceptors and opponency in higher-order cells

Both types of physiol