Development of Vision for Action
1. Introduction to Vision for Action
Importance and Complexity:
Vision for action involves sensorimotor transformation: integrating object positions in retinal space with limb positions in bodily space.
Degrees of Freedom Problem:
There are infinite motor solutions (joint positions, trajectories) for any goal.
Success requires evaluating one's own visuomotor capabilities accurately.
Neural Substrates:
Primary Driver: Dorsal Stream (“where?” and “how?” pathway). — posterior parietal cortex
Contrast: Ventral Stream (“what?” and “who?” pathway for recognition). — inferior temporal cortex
The dorsal stream is often considered developmentally vulnerable.
2. Frames of Reference (FOR) and Localizing Touches
2.1 Coordination Frames
Egocentric (Internal):
Retina-centered: Location relative to the eye.
Head-centered: Retina-centered + proprioceptive signals of eye-in-head.
Body-centered: Head-centered + signals of head-on-trunk.
Allocentric (World/Object-centered):
Objects relative to each other or the environment (e.g., teapot vs. cup).
Requires taking oneself “out of the equation.”
2.2 Developmental Progression of Touch Remapping
In Utero/Infancy:
Infants perceive tactile and auditory stimuli but cannot initially correlate them with visual input.
Buzzing Hands Study:
months: Infants can only orient toward a stimulated hand if the arms are uncrossed.
months: Infants can orient to the correct hand even if arms are crossed. This reflects the recoding of touch into an external frame of reference.
Neural Correlate: Differences in somatosensory cortex EEG signals appear at months but not at months, indicating enhanced modulation and remapping of S1.
2.3 Temporal Order Judgment Task
Task Procedure:
Hands are tapped (crossed vs. uncrossed) almost simultaneously with a slight difference in timing. The participant must determine which hand was tapped first.
It is consistently easier to judge temporal order when hands are uncrossed; this effect persists even when eyes are closed, indicating the process of bringing body-centered tactile input into visual space is highly automated.
Developmental Milestones:
Children only begin to consistently show the “crossed hands effect” around age .
This suggests a protracted development for remapping touch from body-centered to visual coordinates, potentially limited by the slow maturation of cross-hemisphere communication (e.g., the corpus callosum).
The Role of Visual Experience:
Congenitally Blind: These individuals do not show the “crossed hands effect,” suggesting sight is necessary for the initial development of remapping tactile inputs to external space.
Late Restored Sight (Case LM): Unlike sighted controls, LM is not impaired when hands are crossed, performing equally well in both conditions.
Data Sensitivity: A steeper psychometric curve indicates greater sensitivity to timing differences. At the point (simultaneous touches), participants should not be able to distinguish order.

Only at 10 months do infants orient to the correct side when they feel a buzz on their hand if arms are crossed
This suggests body posture information is not yet recoded into external/visual coordinates as in adults at 6.5 months
Between 6.5 and 10 months infants learn to keep better track of where their body parts are in visual space
This ability might continue to develop until at least ~age 5 years in childhood and require visual experience
3. Learning Which Objects are Within Reach
Binocular Depth Cues:
Stereo vision develops between 2.5 and 5 months.
months: Infants begin reaching for the nearer toy based on binocular disparity, even if retinal size is equated (closer toy smaller).
Perspective (Pictorial) Cues:
Infants reach preferentially toward the “near” side of a pictorial illusion only at 6 to 7 months. (e.g., Ames window)
Despite disparity depth perception being available from around 14 weeks, they seem not to incorporate it yet in their behavior (systems may not be linked up yet)
So 3D vision from disparity and perspective places constraints on children’s reaching interactions with the world before the 1st half year of life
4. Graspability, Affordances, and Planning
4.1 Salience vs. Graspability
Faces: Highly salient for infants. Visual saliency often dominates over graspability in younger groups.
4.2 Affordances (Ecological Approach)
Definition: The world is perceived in terms of possible actions for the individual (e.g., a handle affords holding).
Gibson’s ecological approach (1979): States that the world is perceived in terms of its possible actions for the individual.
Direct Recognition: The graspable element of an object is directly recognized without deliberate information processing.
Examples: A chair “affords” sitting; a handle “affords” holding.
Action Compatibility: Affordance is the part of the object that allows you to perform an action. It depends on the interaction between visual attributes and the observer's:
Body (physical capabilities).
Experience (learned associations).
Goals (e.g., if we are tired, many surfaces start looking like a chair).
Affordance Processing — Sensory-Motor Theory of Conceptual Formation:
Adults activate grasping regions, specifically the Anterior Intraparietal Sulcus (AIP) and ventral Premotor Cortex (vPM), upon simply viewing a familiar utensil.
This activation occurs passively, even without the intent to act.
This implies that viewing objects automatically triggers a motor plan to grasp them, aligning with the recognition of Gibsonian affordances.
Modulation by Knowledge: Affordances are not just automatic but can be modulated by object knowledge.
Spatula Study: Subjects were asked to pick up a spatula (head toward them, handle away).
Even though the head is easier to grab, most subjects grabbed the handle (typical use).
Individuals performing a semantic distractor task were much less likely to grab the handle compared to those doing a spatial distractor task. This suggests semantic tasks interfere with access to the object knowledge required to recognize functional affordances
Neural Circuitry:
Anterior Intraparietal Sulcus (AIP): Converts object shape into motor grasp responses. Contains motor, visual, and visuomotor neurons.
Premotor Cortex Area F5 (Broca’s area): Receives projections from AIP for selecting action sequences and motor planning.


4.3 Development of Affordance Processing
months: Infants show “pre-pincer” hand shaping for small objects despite lacking the coordination to execute the grasp. (only do it at 8-9 months)
years+: Passive viewing of tools activates AIP.
The Cup Task and Inhibition: Children struggle to inhibit the “potentiated grasp” when handles are task-irrelevant, requiring more frontal cortex recruitment for suppression.
Scale Errors: Toddlers attempt to perform correct actions on objects that are the wrong size (e.g., trying to sit in a tiny chair).
Inhibition: Children struggle to inhibit the “potentiated grasp” when handles are task-irrelevant, requiring more frontal cortex recruitment for suppression.

4.4 End-State Planning
Definition: Planning a movement by starting in an uncomfortable position (awkward grip) to ensure the movement ends in a comfortable terminal state (end-state comfort).
Developmental Milestones:
Children under age 3 struggle significantly with this, usually opting for immediate comfort.
The ability begins to emerge around age 3.5, though it is not fully developed and remains difficult for young children.
This specific motor planning capability is not considered very trainable
4.5 Which objects should I reach for and how?
Dorsal Stream Circuitry: The dorsal stream contains dedicated neural circuitry specifically designed to transform visual inputs into appropriate motor actions.
Developmental Activation: Grasp-related and category-selective brain activation during the passive viewing of tool pictures reaches adult-sized levels from age 6 onwards.
Vision-to-Grasp "Blue Path":
Correct pre-shaping of the hand in young infants solely based on visual information suggests that the transformation pathway (the "blue path") is present early.
While this neural path may exist, it might not always manifest consistently in functional behavior.
Conflict and Resource Recruitment:
Children get easily distracted by attractive affordances when they conflict with the task at hand.
Children under 3 years old struggle to adopt awkward initial grips to achieve end-state comfort.
Older children (under 7 years) still need to recruit additional neural resources to ignore the grabability of familiar objects when necessary.

5. Building Models of the Body in Action
5.1 Noise and Uncertainty
Sensory and motor systems are subject to “noise” (e.g., fog while judging speed of car). Optimal performance requires minimizing variance by combining sensory sources and judging risk.
5.2 Visuomotor Decision-Making
Adults: Optimize performance by accounting for their own system uncertainty.
Children (under years): Choose suboptimal strategies.
They are not necessarily just imprecise; they fail to compensate or optimize for their own noise.
Children tend to aim too close to penalty regions (“risk-taking”), playing for high stakes at a greater risk of losing.
1. Developmental Milestones
months: Development of binocular depth cues (stereo vision).
months: Infants begin reaching for objects based on binocular disparity; appearance of “pre-pincer” hand shaping for small objects.
months: Infants can only orient toward a tactile stimulus (buzzing) if their arms are uncrossed.
months: Infants begin to reach preferentially toward the “near” side of pictorial illusions (e.g., Ames window).
months: Infants develop the coordination required to execute a pincer grasp.
months: Infants successfully orient to the correct limb even when arms are crossed, reflecting the remapping of touch into an external frame of reference.
Age : The ability to plan movements for end-state comfort (awkward initial grip for a comfortable finish) begins to emerge.
Age : Consistently demonstrate the “crossed hands effect” in temporal order judgments; adult-sized neural activation during passive viewing of tool pictures.
Under years: Children utilize suboptimal visuomotor strategies, failing to account for their own system noise or uncertainty.
2. Assessment Paradigms
Buzzing Hands Study: Used to test when infants can recode tactile input into an external frame of reference by crossing their limbs.
Temporal Order Judgment (TOJ) Task: Participants judge which of two hands was tapped first in crossed and uncrossed conditions to evaluate tactile-to-visual remapping.
Pictorial Illusion Reaching (Ames Window): Assessment of whether infants use perspective cues to guide their reaching behavior.
Spatula Study: Investigates functional affordances by asking subjects to pick up a tool with semantic or spatial distractors.
The Cup Task: Measures the ability to inhibit “potentiated grasps” when an object handle is task-irrelevant.
Scale Error Observations: Monitoring instances where toddlers attempt to perform actions on objects that are the incorrect size (e.g., sitting in a tiny chair).
End-State Planning Task: Evaluating if a child will adopt an uncomfortable initial posture to ensure a comfortable final position.
Visuomotor Decision-Making Tasks: Analyzing aiming strategies relative to penalty regions to determine if an individual accounts for sensory and motor noise.
3. Neural Substrates
Dorsal Stream: Located in the posterior parietal cortex; the “where/how” pathway that drives vision for action.
Ventral Stream: Located in the inferior temporal cortex; the “what/who” pathway for object recognition.
Somatosensory Cortex (S1): Shows signals of enhanced modulation and remapping at months during tactile location tasks.
Anterior Intraparietal Sulcus (AIP): Converts object shape into motor grasp responses; contains visuomotor neurons.
Ventral Premotor Cortex (vPM / Area F5): Receives input from the AIP to select action sequences and facilitate motor planning.
Frontal Cortex: Recruited for the inhibitory suppression of automatic motor plans (affordances).
Corpus Callosum: Slow maturation limits the cross-hemisphere communication necessary for remapping body-centered input into visual space.