KG

Review of FIT, GS, and Itti-Koch Saliency Model

Feature Integration Theory (FIT)

  • Explains how we perceive coherent objects by binding simple features together at attended locations.
  • Visual search is easy if the target differs from non-targets in a simple feature.
  • Visual search is difficult if the target differs from non-targets in a conjunction of features.
  • At attended locations, features are correctly bound together.
  • At unattended locations, features are often incorrectly bound together.
  • Arrangements of colored shapes into textures is much easier if different parts of a scene differ in simple features compared with a conjunction of features.
  • Stages of processing:
    • Pre-attentive orientation representations (e.g., vertical lines).
    • Pre-attentive color representations (e.g., red).
    • Master Map of Locations.
    • Post-attentive object representations (e.g., Red-Vert).
  • Priority map:
    • A spatial representation where each location in the visual field is assigned an activation value that represents its relative priority for attentional selection.
    • Integrates sensory input from multiple feature dimensions (like color and edge orientation).
    • Higher activations indicate locations more likely to be attended.
    • In FIT, all locations have the same activation (i.e., just says where the stimuli are).

Guided Search (GS)

  • Improves upon FIT by adding top-down influences.
  • Basic components:
    • Stimulus: The stimulus is filtered through broadly-tuned "categorical" channels.
    • Input Channels: Orientation.
    • Feature Maps: The output produces feature maps with activation based on local difference (bottom-up) and task demands (top-down).
    • Top-down commands to feature maps activate locations possessing specific categorical attributes (e.g., activate "black" lines).
    • Priority Map: A weighted sum of these activations forms the Activation Map. In visual search, attention deploys limited capacity resources in order of decreasing activation.
  • The priority map codes relative levels of activation based on stimulus-driven information (as in FIT) but also goal-directed information (what I’m looking for).
  • Search is accomplished by going to the highest peak, then the next highest, etc. until the target is found.

Illustration of Visual Search Difficulty

  • Visual Scene A is easy, Visual Scene B is hard, Visual Scene C is medium.
  • The target in scene A is strongly differentiated from non-targets.
    • Modern Priority Map for Scene A shows distinct height differences for Uniform Circles.
  • The target in scene C is moderately differentiated from non-targets.
    • Modern Priority Map for Scene C shows moderate height differences for Uniform Circles.
  • The target in scene B is weakly differentiated from non-targets, so much so that, due to noise sometimes a non-target might even have higher priority.
    • Modern Priority Map for Scene B shows minimal height differences for Uniform Circles.

Recap: Feature Integration Theory vs. Guided Search

  • Feature Integration Theory (FIT) demonstrated that the role of attention is to bind features together (from across different dimensions/channels) at attended locations.
    • Explains phenomenon like the existence of illusory conjunctions and explains some search behavior (but not all).
  • Guided Search (GS) improved FIT by improving how well attentional priority (of locations) gets computed.
    • In FIT, all locations with stimuli are equally likely to be candidates for attentional processing. (The topography is a flatland, with plateaus of equal height wherever there’s a stimulus.)
    • In GS, locations can differ in their “peakiness” – some locations can have higher peaks than others, depending on the extent to which a location matches what someone is looking for.
  • In both models, attention plays the same role of binding features together at attended locations.
  • The key difference is the extent to which the spotlight of attention is controlled.
    • In FIT, spotlight movement is random.
    • In GS, spotlight movement is guided.

Fundamental Difference Between FIT and GS

  • In FIT, some searches are "pop-out" (parallel) and some are serial with random selection of locations, while in GS, all searches are serial but with varying degrees of attentional guidance to potential targets.

Itti-Koch Saliency Model

  • Instead of improving attentional priority by including top-down information, the stimulus-driven processing is made more intelligent.
  • Processing stages:
    • Pre-attentive orientation representations.
    • Pre-attentive color representations.
    • Orientation Center-Surround Difference Map.
    • Color Center-Surround Difference Map.
    • Saliency map: codes for regions of the visual input that have local feature discontinuities (more white -> more salient) based on center-surround differences.
  • Attentional spotlight is directed toward the region with the highest saliency, then the next highest, etc. until target found (serial attentional deployment, like in GS).
  • Inhibition of return: when an object is selected, if it’s not the target, it’s activation is set to zero (so that the attentional spotlight can move to the next object).

Modern Models

  • Integrate both the mechanisms of GS and IK models – goal-directed attention and improved stimulus-directed attention.
  • Different regions of the brain code for these different types of attentional priority.