SLAY The Development of Vision for Action

What is vision for action?

Vision for action refers to how visual information is used to guide movements, not just recognise objects. This includes reaching, grasping, intercepting moving objects, and judging what is within reach.

It is computationally difficult because the brain must:

  • Transform retinal information into body-centred motor commands (sensorimotor transformation)

  • Solve the degrees of freedom problem: there are many possible joint and movement solutions for the same goal

  • Take into account one’s own motor abilities and uncertainty

This is primarily supported by the dorsal visual stream (“where/how”), which is more developmentally vulnerable than the ventral stream


Core neural framework: dorsal vs ventral streams

  • Ventral stream: vision for recognition (“what/who”)

  • Dorsal stream: vision for action (“where/how”)

Goodale & Milner’s model proposes that these streams are functionally distinct, with the dorsal stream directly linking perception to motor output. Developmental disorders often disproportionately affect dorsal stream functions


1. Localising touches in space (building body–world mappings)

Key problem

Infants must learn to link:

  • Tactile input (where was I touched?)

  • Visual input (where is my body in space?)

This requires using different frames of reference.

Frames of reference

  • Egocentric: retina-centred, head-centred, body-centred

  • Allocentric: object-to-object, world-centred

Adults flexibly switch between frames depending on the task (e.g. cup vs body, teapot vs cup). Infants must learn this flexibility

Developmental timeline

  • 6.5 months: infants orient to a touched hand only when hands are uncrossed
    → touch is coded in a body-centred frame

  • ~10 months: infants correctly orient even when hands are crossed
    → touch has been remapped into external/visual coordinates

  • EEG evidence shows this remapping emerges in motor cortex around 10 months

  • Full adult-like remapping (e.g. temporal order judgements with crossed hands) continues developing until ~5–6 years and requires visual experience

Take-home

Mapping the body into visual space is slow, experience-dependent, and foundational for later visuomotor skills.


2. Learning which objects are within reach

Infants between 6–12 months show strong reaching behaviour, but must learn which objects are actually reachable.

Binocular depth cues

  • Stereo vision emerges rapidly between 2.5–5 months

  • By ~5 months, infants preferentially reach for the closer object when binocular disparity information is available

  • Binocular vision provides an action-relevant sense of depth, not just perception

Perspective (pictorial) cues

  • Cues like size and linear perspective (e.g. Ames window) influence reaching only by 6–7 months

  • Indicates later use of monocular depth cues for action

Take-home

Depth information constrains action before the first year, but different depth cues become useful at different times.


3. Which objects should I grasp, and how?

This section explains how perception selects actions.

Salience vs graspability

Young infants are biased toward visually salient objects (e.g. faces), even when another object is easier to grasp.

  • In younger infants, salience dominates

  • With development, graspability (motor relevance) increasingly guides action choice


Affordances (critical concept)

What are affordances?

From Gibson’s ecological theory:
Objects are perceived in terms of the actions they afford (e.g. handle affords grasping).

Affordances arise from:

  • Object properties

  • The observer’s goals

  • The observer’s motor capabilities

They are not purely reflexive.

Evidence

  • Adults preferentially grasp a tool by its handle, even when another grasp is biomechanically easier

  • This preference weakens when semantic processing is disrupted → object knowledge matters

  • Brain area AIP (anterior intraparietal sulcus) contains neurons that link:

    • Object shape

    • Grasp type

    • Motor execution

  • Simply seeing tools activates grasp-related motor areas in adults and children from ~6 years

Development

  • Infants show pre-shaping of the hand before they can execute refined grasps

  • Even 5-month-olds show sensitivity to affordances they cannot yet perform

  • Action potentiation develops early; inhibition of inappropriate affordances develops later

  • This explains scale errors in toddlers (trying to sit in tiny chairs)


End-state planning (planning ahead)

End-state comfort means choosing an initially awkward grip to end in a comfortable position.

Development

  • Emerges around 3.5 years

  • Develops slowly and is hard to train

  • Younger children prioritise immediate comfort over future outcomes

Take-home

Planning actions across time is a late-developing component of vision for action.


4. Building models of your body in action: visuomotor decision-making

Actions are noisy and probabilistic. Optimal behaviour requires:

  • Estimating uncertainty

  • Weighing costs vs rewards

  • Choosing actions that maximise expected outcomes

Key findings

  • Adults integrate uncertainty and cost optimally

  • Children (up to ~11 years) use suboptimal strategies

  • They aim closer to penalties → risk-taking

  • This persists even when it clearly harms performance

Interpretation

Improved visuomotor performance in development reflects not just motor refinement, but maturation of decision-making systems.


Big picture take-home messages

  • Vision for action is not automatic at birth; it is built through experience

  • The dorsal stream supports specialised transformations from vision to action

  • Core components (body mapping, affordances, depth use) emerge early but refine slowly

  • Children often see affordances before they can control them

  • Mature visuomotor behaviour depends on perception, motor control and decision-making

This is a beautiful example of how perception, action, and cognition grow together—sometimes out of sync, sometimes in elegant coordination, often messily, like most real development.