Review of FIT, GS, and Itti-Koch Saliency Model
Feature Integration Theory (FIT)
- Explains how we perceive coherent objects by binding simple features together at attended locations.
- Visual search is easy if the target differs from non-targets in a simple feature.
- Visual search is difficult if the target differs from non-targets in a conjunction of features.
- At attended locations, features are correctly bound together.
- At unattended locations, features are often incorrectly bound together.
- Arrangements of colored shapes into textures is much easier if different parts of a scene differ in simple features compared with a conjunction of features.
- Stages of processing:
- Pre-attentive orientation representations (e.g., vertical lines).
- Pre-attentive color representations (e.g., red).
- Master Map of Locations.
- Post-attentive object representations (e.g., Red-Vert).
- Priority map:
- A spatial representation where each location in the visual field is assigned an activation value that represents its relative priority for attentional selection.
- Integrates sensory input from multiple feature dimensions (like color and edge orientation).
- Higher activations indicate locations more likely to be attended.
- In FIT, all locations have the same activation (i.e., just says where the stimuli are).
Guided Search (GS)
- Improves upon FIT by adding top-down influences.
- Basic components:
- Stimulus: The stimulus is filtered through broadly-tuned "categorical" channels.
- Input Channels: Orientation.
- Feature Maps: The output produces feature maps with activation based on local difference (bottom-up) and task demands (top-down).
- Top-down commands to feature maps activate locations possessing specific categorical attributes (e.g., activate "black" lines).
- Priority Map: A weighted sum of these activations forms the Activation Map. In visual search, attention deploys limited capacity resources in order of decreasing activation.
- The priority map codes relative levels of activation based on stimulus-driven information (as in FIT) but also goal-directed information (what I’m looking for).
- Search is accomplished by going to the highest peak, then the next highest, etc. until the target is found.
Illustration of Visual Search Difficulty
- Visual Scene A is easy, Visual Scene B is hard, Visual Scene C is medium.
- The target in scene A is strongly differentiated from non-targets.
- Modern Priority Map for Scene A shows distinct height differences for Uniform Circles.
- The target in scene C is moderately differentiated from non-targets.
- Modern Priority Map for Scene C shows moderate height differences for Uniform Circles.
- The target in scene B is weakly differentiated from non-targets, so much so that, due to noise sometimes a non-target might even have higher priority.
- Modern Priority Map for Scene B shows minimal height differences for Uniform Circles.
Recap: Feature Integration Theory vs. Guided Search
- Feature Integration Theory (FIT) demonstrated that the role of attention is to bind features together (from across different dimensions/channels) at attended locations.
- Explains phenomenon like the existence of illusory conjunctions and explains some search behavior (but not all).
- Guided Search (GS) improved FIT by improving how well attentional priority (of locations) gets computed.
- In FIT, all locations with stimuli are equally likely to be candidates for attentional processing. (The topography is a flatland, with plateaus of equal height wherever there’s a stimulus.)
- In GS, locations can differ in their “peakiness” – some locations can have higher peaks than others, depending on the extent to which a location matches what someone is looking for.
- In both models, attention plays the same role of binding features together at attended locations.
- The key difference is the extent to which the spotlight of attention is controlled.
- In FIT, spotlight movement is random.
- In GS, spotlight movement is guided.
Fundamental Difference Between FIT and GS
- In FIT, some searches are "pop-out" (parallel) and some are serial with random selection of locations, while in GS, all searches are serial but with varying degrees of attentional guidance to potential targets.
Itti-Koch Saliency Model
- Instead of improving attentional priority by including top-down information, the stimulus-driven processing is made more intelligent.
- Processing stages:
- Pre-attentive orientation representations.
- Pre-attentive color representations.
- Orientation Center-Surround Difference Map.
- Color Center-Surround Difference Map.
- Saliency map: codes for regions of the visual input that have local feature discontinuities (more white -> more salient) based on center-surround differences.
- Attentional spotlight is directed toward the region with the highest saliency, then the next highest, etc. until target found (serial attentional deployment, like in GS).
- Inhibition of return: when an object is selected, if it’s not the target, it’s activation is set to zero (so that the attentional spotlight can move to the next object).
Modern Models
- Integrate both the mechanisms of GS and IK models – goal-directed attention and improved stimulus-directed attention.
- Different regions of the brain code for these different types of attentional priority.