William James (1842-1910) :
“… the taking possession by the mind, in clear and vivid form, of one of what seem several simultaneously possible objects or trains of thought. Focalisation, concentration of consciousness are of its essence.” (p. 403-4)
Focused attention: attention to only one source of information while ignoring all others (also known as selective attention)
Divided attention: performing two tasks at the same time; also known (in common parlance) as multi-tasking.
Cherry (1953): the cocktail party phenomenon.
Cherry (1953): the dichotic listening task.
Participants played concurrent messages in each ear, shadowing1 one.
Participants have a poor representation of the unattended channel, reporting only the physical characteristics of the speech (male à female, volume, etc.).
Later research (Moray, 1959; Triesman, 1964) showed an inability to report semantic information, and that other physical characteristics (language) are reported only occasionally.
All attention models assume limited capacity.
The optic nerve carries ~ 108 bits/sec (Itti & Koch, 2000).
Even with ¼ of the brain dedicated to vision, it is not possible to process all information at any given moment.
How can we test this?
Load Theory (Lavie, 2005, 2010; see also Kahneman, 1973)
Assesses awareness during easy (low load) and difficult (high load) tasks.
Over the years, attention research has moved from the auditory to the visual modality.
The key finding: a limited-capacity system requires attention to be selective. The resultant trade-off is illustrated in the ‘invisible gorilla’ task.
Now we move on to examine selective attention in greater detail.
Posner & Cohen (1984) manipulated the temporal relationship between the cue and target (displayed on the x-axis), as well as the spatial relationship…
Targets could either appear at the same location as the cue (‘valid’ or ‘cued’) or the opposite (‘invalid’, ‘uncued’) location.
Participants were faster to detect cued targets than uncued targets (filled vs empty circles on the figure). This effect is called facilitation.
As the cue-target interval extends beyond 300ms, responses to cued targets become markedly slower. This effect is known as inhibition of return (IOR).
IOR is a key component of selective attention.
It is the biasing of limited processing resources away from previously attended locations (hence inhibition of return).
As such, these resources can be directed to novel parts of the visual scene.
Thus, many theories of attention hold that attention traverses the visual world like a spotlight: anything falling within the illuminated boundary is selected for further processing; everything else is relatively ignored.
Object-based theory of attention
Some researchers argue that attention selects objects, not space
Relative to A, locations B and C are equidistant. Thus if A is cued, participants should be able to detect targets B or C equally.
Egly, Driver + Rafal (1994) - However, when A is cued, RTs are shorter to targets B (same object) other than C (different object)
Once attention is orientated to an object, it processing is automatically afforded to the entire obhect surface (B>C)
Objects disrott attentional processing outside them. If attention selects regions of space, the both target positions ought to be detected equally rapidly. However, the appearance of an object distorts processing of the space around the perimeter.
When both target positions are encompassed within an object, what happens? They are now processed equally rapidly.
Selective attention is a necessary requirement of a limited-capacity system.
The early facilitation and later inhibition allows for efficient (albeit imperfect) use of limited resources.
Attention acts like a “spotlight” in some cases, but findings suggest objects have primacy.
Top-down= start with mental template of what is looked/thought about - controlling attention within an object
Top-down control - in charge of what you pay attention to
Bottom-up= sensory information - regestering information which causes cognitive change.
Stimulus driven attention control - attention on part of image
If you were to program Asimo, you’d be wise to strike a balance between top-down and bottom-up control.
Too much top-down = miss important information
Too much bottom-up = no voluntary control
You’d also have to ensure that certain visual cues capture its attention automatically.
How the retina ‘sees’ the world:
View from a CCD camera
Evolution of vision is not driven by a need to create aesthetic representation of environment, it is to aid navigation, locate food sources, mates and rapidly detect (and thus avoid) potential sources of threat.
Fovea - central part of vision - high resolution
Further from fovea, vision becomes blurry
2 blind spots in left and right peripheries, don’t notice
Rich representations are seldom required in nature
Tinbergen and Perdeck (1950), Herring gulls:
When a gull chick is hungry, it will peck at its parent’s bill. The parent responds by regurgitating food which the chick then eats.
Why should the chicks peck at the parent’s bill?
Why doesn’t the mere sight of each bird elicit the regurgitation response?
Presented chicks with a variety of stimuli, reducing complexity.
Results show that chicks would only peck at stimuli containing a red spot.
Of these stimuli, there was no difference between pecking rates of natural versus artificial stimuli.
As the figure shows, a red stick elicited the greatest pecking responses, suggested chicks do not ‘recognise’ their parents per se. They simply respond to red-yellow contrast.
Experimental methods: how can we determine which visual cues are considered important by the visual system and whether they attract automatically or not?
Method 1: Validity manipulations
The precueing paradigm
‘Validity’
50% = cue does not predict
80% = cue does predict
20% = cue counterpredicts target location
Two Possibilities:
Is attentional cueing modulated by predictive validity?
Posner & Cohen (1984)
Method 2: Search Slope Functions
Basic Principle
searching for red dot among green dots
normal trichromatic vision (not red-green colour blind) easy task
fast to detect red dot
doesn’t matter how many green dots there is, easy task, equally fast
when given slightly harder task (grey target amongst green dots) inherently a bit harder, slower
adding more is costly, harder to detect grey item
Something easy to find, flat search function, harder to find, steep search function
Irrelevant singleton paradigm (e.g., Egeth Yantis, 1997)
Here, the task is to find a target letter (S) amongst distractors (E).
There is also a task-irrelevant feature (a red irrelevant singleton). This can be any item in the display, meaning it coincides with the target on 1/n of trials.
1/3 of set-size three trials, will be red (unique colour item)
1/6 of set-size six trials will be red (unique colour item)
If the singleton captures attention, it has a dual effect on performance:
As a distractor, attention is taken away from the target. Target detection will be poor
As a target, attention is summoned to the target. Attention quick
Capture is inferred on the sambe basis as before. That is on the basis of divergent search slopes. (e.g., Simons, 2000; Treisman, 1986; Wolfe, 1997, 1998; Yantis & Jonides, 1984).
Franconeri and Simons (2003)
Which visual events capture attention automatically?
DIfferent types of motion attract attention independently? Is there a more aprsimonious stimulus for capture.
Yes, all types of motion attract attention. Onset (of object or motion)
Which visuals don’t capture automatically?
Old objects (onsets lose attention after ~500ms; Gellatly et al., 1999)
Object offsets
Task irrelevant colour singletons/changes
Attention can be controlled in a top-down or a bottom-up manner.
The attention system is highly sensitive to certain visual cues (abrupt luminance, changes, object onset, motion onset), such that they attract attention automatically.
Why might this sensitivity have evolved? Objects that appear or move suddenly are best associated with threat.
Visual search is the process through which we locate an item of importance within an array of distractor items.
Two Types
Parallel Search (AKA Pop-Out Search)
Serial search
Theories of Visual Search
Bottom-up / parallel
Top-down/ serial
Itti & Kock 2000: Saliency map model
Visual scenes are composed of various features (colour, luminosity, line orientation). All differ in terms of their relative salience (both within and between dimensions).
Each has its own feature map, which are then combined and averaged on a 2-D topographical map. Search is then driven in a bottom-up manner with attention visiting the most → least salient parts in rank order.
Evidence:
Coded a number of natural scenes in terms of salience. Model predicts human search patterns well (humans = 2.6s; model = 2.2s)
Criticisms:
People seldom view scenes passively and without ANY top-down guidance. In some cases, model performed badly (humans = <20s; model = > 15 minutes!)
Triesman & Galde (1980): FIT (Feature Intergration Theory)
Two stages: 1. Parallel; 2. Serial.
The scene is preattentively represented as composite parts.
Attention binds these into meaningful information like objects.
Evidence:
Much visual search data can be interpreted in favour of FIT.
Feature: target defined by unique attribute
Conjunction: target defined by combination of two features
Criticisms:
Some conjunctions yield flat slopes (e.g., Nakayama & Silverman, 1986). Also, how flat or steep do slopes have to be?
As social animals humans have evolved and developed in groups. Other people are behaviourally important to us and we often perform cognitive tasks with them.
Has nature programmed us to
preferentially attend to others
represent attention to others
Looking up → This offers a good example of a social cue: where our attention is directed to different areas of space by the gaze direction of other people.
Infant gaze following
Meltzoff & Brooks (2007)
Suggests: 1) Infants as early as 12 months follow gaze; 2) Have a precursor to Theory of Mind (ToM), allowing them to differentiate between observers who can and cannot see.
Ristic & Kingstone (2005)
When the stimulus is interpreted as a social cue, it cues attention. When it is interpreted as something else, it does not.
Brennan et al. (2008)
1P: solitary search
SV: shared voice
SG+V: shared gaze and voice
SG: shared gaze
NC: No communication (but dual-participation)
NOM: nominal pseudo-pairs.
Joint search is twice as efficient as solitary search. People make use of their partner’s search abilities without explicit instruction to do so
does this ability reflect natural or nurtured abilities?
Not all visual transients capture attention automatically. The visual system has evolved sensitivity to those more readily associated with threat.
Theories of visual search attempt to identify the sequential stages of processing and the nature of processing within each one. No one theory suffices, and this is still a work in progress.
Research is now focusing on how attention is directed by other people (i.e., social cues). Does this promise new insights into human cognition? (See Skarratt et al., 2012.)