SLAY The Development of Vision for Action

What is vision for action?

Vision for action refers to how visual information is used to guide movements, not just recognise objects. This includes reaching, grasping, intercepting moving objects, and judging what is within reach.

It is computationally difficult because the brain must:

Transform retinal information into body-centred motor commands (sensorimotor transformation)
Solve the degrees of freedom problem: there are many possible joint and movement solutions for the same goal
Take into account one’s own motor abilities and uncertainty

This is primarily supported by the dorsal visual stream (“where/how”), which is more developmentally vulnerable than the ventral stream

Core neural framework: dorsal vs ventral streams

Ventral stream: vision for recognition (“what/who”)
Dorsal stream: vision for action (“where/how”)

Goodale & Milner’s model proposes that these streams are functionally distinct, with the dorsal stream directly linking perception to motor output. Developmental disorders often disproportionately affect dorsal stream functions

1. Localising touches in space (building body–world mappings)

Key problem

Infants must learn to link:

Tactile input (where was I touched?)
Visual input (where is my body in space?)

This requires using different frames of reference.

Frames of reference

Egocentric: retina-centred, head-centred, body-centred
Allocentric: object-to-object, world-centred

Adults flexibly switch between frames depending on the task (e.g. cup vs body, teapot vs cup). Infants must learn this flexibility

Developmental timeline

6.5 months: infants orient to a touched hand only when hands are uncrossed
→ touch is coded in a body-centred frame
~10 months: infants correctly orient even when hands are crossed
→ touch has been remapped into external/visual coordinates
EEG evidence shows this remapping emerges in motor cortex around 10 months
Full adult-like remapping (e.g. temporal order judgements with crossed hands) continues developing until ~5–6 years and requires visual experience

Take-home

Mapping the body into visual space is slow, experience-dependent, and foundational for later visuomotor skills.

2. Learning which objects are within reach

Infants between 6–12 months show strong reaching behaviour, but must learn which objects are actually reachable.

Binocular depth cues

Stereo vision emerges rapidly between 2.5–5 months
By ~5 months, infants preferentially reach for the closer object when binocular disparity information is available
Binocular vision provides an action-relevant sense of depth, not just perception

Perspective (pictorial) cues

Cues like size and linear perspective (e.g. Ames window) influence reaching only by 6–7 months
Indicates later use of monocular depth cues for action

Take-home

Depth information constrains action before the first year, but different depth cues become useful at different times.

3. Which objects should I grasp, and how?

This section explains how perception selects actions.

Salience vs graspability

Young infants are biased toward visually salient objects (e.g. faces), even when another object is easier to grasp.

In younger infants, salience dominates
With development, graspability (motor relevance) increasingly guides action choice

Affordances (critical concept)

What are affordances?

From Gibson’s ecological theory:
Objects are perceived in terms of the actions they afford (e.g. handle affords grasping).

Affordances arise from:

Object properties
The observer’s goals
The observer’s motor capabilities

They are not purely reflexive.

Evidence

Adults preferentially grasp a tool by its handle, even when another grasp is biomechanically easier
This preference weakens when semantic processing is disrupted → object knowledge matters
Brain area AIP (anterior intraparietal sulcus) contains neurons that link:
- Object shape
- Grasp type
- Motor execution
Simply seeing tools activates grasp-related motor areas in adults and children from ~6 years

Development

Infants show pre-shaping of the hand before they can execute refined grasps
Even 5-month-olds show sensitivity to affordances they cannot yet perform
Action potentiation develops early; inhibition of inappropriate affordances develops later
This explains scale errors in toddlers (trying to sit in tiny chairs)

End-state planning (planning ahead)

End-state comfort means choosing an initially awkward grip to end in a comfortable position.

Development

Emerges around 3.5 years
Develops slowly and is hard to train
Younger children prioritise immediate comfort over future outcomes

Take-home

Planning actions across time is a late-developing component of vision for action.

4. Building models of your body in action: visuomotor decision-making

Actions are noisy and probabilistic. Optimal behaviour requires:

Estimating uncertainty
Weighing costs vs rewards
Choosing actions that maximise expected outcomes

Key findings

Adults integrate uncertainty and cost optimally
Children (up to ~11 years) use suboptimal strategies
They aim closer to penalties → risk-taking
This persists even when it clearly harms performance

Interpretation

Improved visuomotor performance in development reflects not just motor refinement, but maturation of decision-making systems.

Big picture take-home messages

Vision for action is not automatic at birth; it is built through experience
The dorsal stream supports specialised transformations from vision to action
Core components (body mapping, affordances, depth use) emerge early but refine slowly
Children often see affordances before they can control them
Mature visuomotor behaviour depends on perception, motor control and decision-making

This is a beautiful example of how perception, action, and cognition grow together—sometimes out of sync, sometimes in elegant coordination, often messily, like most real development.