SLAY The Development of Vision for Action
What is vision for action?
Vision for action refers to how visual information is used to guide movements, not just recognise objects. This includes reaching, grasping, intercepting moving objects, and judging what is within reach.
It is computationally difficult because the brain must:
Transform retinal information into body-centred motor commands (sensorimotor transformation)
Solve the degrees of freedom problem: there are many possible joint and movement solutions for the same goal
Take into account one’s own motor abilities and uncertainty
This is primarily supported by the dorsal visual stream (“where/how”), which is more developmentally vulnerable than the ventral stream
Core neural framework: dorsal vs ventral streams
Ventral stream: vision for recognition (“what/who”)
Dorsal stream: vision for action (“where/how”)
Goodale & Milner’s model proposes that these streams are functionally distinct, with the dorsal stream directly linking perception to motor output. Developmental disorders often disproportionately affect dorsal stream functions
1. Localising touches in space (building body–world mappings)
Key problem
Infants must learn to link:
Tactile input (where was I touched?)
Visual input (where is my body in space?)
This requires using different frames of reference.
Frames of reference
Egocentric: retina-centred, head-centred, body-centred
Allocentric: object-to-object, world-centred
Adults flexibly switch between frames depending on the task (e.g. cup vs body, teapot vs cup). Infants must learn this flexibility
Developmental timeline
6.5 months: infants orient to a touched hand only when hands are uncrossed
→ touch is coded in a body-centred frame~10 months: infants correctly orient even when hands are crossed
→ touch has been remapped into external/visual coordinatesEEG evidence shows this remapping emerges in motor cortex around 10 months
Full adult-like remapping (e.g. temporal order judgements with crossed hands) continues developing until ~5–6 years and requires visual experience
Take-home
Mapping the body into visual space is slow, experience-dependent, and foundational for later visuomotor skills.
2. Learning which objects are within reach
Infants between 6–12 months show strong reaching behaviour, but must learn which objects are actually reachable.
Binocular depth cues
Stereo vision emerges rapidly between 2.5–5 months
By ~5 months, infants preferentially reach for the closer object when binocular disparity information is available
Binocular vision provides an action-relevant sense of depth, not just perception
Perspective (pictorial) cues
Cues like size and linear perspective (e.g. Ames window) influence reaching only by 6–7 months
Indicates later use of monocular depth cues for action
Take-home
Depth information constrains action before the first year, but different depth cues become useful at different times.
3. Which objects should I grasp, and how?
This section explains how perception selects actions.
Salience vs graspability
Young infants are biased toward visually salient objects (e.g. faces), even when another object is easier to grasp.
In younger infants, salience dominates
With development, graspability (motor relevance) increasingly guides action choice
Affordances (critical concept)
What are affordances?
From Gibson’s ecological theory:
Objects are perceived in terms of the actions they afford (e.g. handle affords grasping).
Affordances arise from:
Object properties
The observer’s goals
The observer’s motor capabilities
They are not purely reflexive.
Evidence
Adults preferentially grasp a tool by its handle, even when another grasp is biomechanically easier
This preference weakens when semantic processing is disrupted → object knowledge matters
Brain area AIP (anterior intraparietal sulcus) contains neurons that link:
Object shape
Grasp type
Motor execution
Simply seeing tools activates grasp-related motor areas in adults and children from ~6 years
Development
Infants show pre-shaping of the hand before they can execute refined grasps
Even 5-month-olds show sensitivity to affordances they cannot yet perform
Action potentiation develops early; inhibition of inappropriate affordances develops later
This explains scale errors in toddlers (trying to sit in tiny chairs)
End-state planning (planning ahead)
End-state comfort means choosing an initially awkward grip to end in a comfortable position.
Development
Emerges around 3.5 years
Develops slowly and is hard to train
Younger children prioritise immediate comfort over future outcomes
Take-home
Planning actions across time is a late-developing component of vision for action.
4. Building models of your body in action: visuomotor decision-making
Actions are noisy and probabilistic. Optimal behaviour requires:
Estimating uncertainty
Weighing costs vs rewards
Choosing actions that maximise expected outcomes
Key findings
Adults integrate uncertainty and cost optimally
Children (up to ~11 years) use suboptimal strategies
They aim closer to penalties → risk-taking
This persists even when it clearly harms performance
Interpretation
Improved visuomotor performance in development reflects not just motor refinement, but maturation of decision-making systems.
Big picture take-home messages
Vision for action is not automatic at birth; it is built through experience
The dorsal stream supports specialised transformations from vision to action
Core components (body mapping, affordances, depth use) emerge early but refine slowly
Children often see affordances before they can control them
Mature visuomotor behaviour depends on perception, motor control and decision-making
This is a beautiful example of how perception, action, and cognition grow together—sometimes out of sync, sometimes in elegant coordination, often messily, like most real development.