Disentangling topdown from bottom up influences on attentional allocation in dynamic scenes

Carmi, Ran and Itti, Lawrence (2004) Disentangling topdown from bottom up influences on attentional allocation in dynamic scenes. In: 11th Joint Symposium on Neural Computation, May 15 2004, University Of Southern California. (Unpublished) https://resolver.caltech.edu/CaltechJSNC:2004.poster002

Full text not available from this repository.

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechJSNC:2004.poster002

Abstract

Motivation: Attentional allocation is determined by the interplay between bottom-up and top-down influences. Here we try to quantify the relative contributions of different influences on attentional allocation in dynamic scenes, as well as examine how they change over time. Methods: In order to manipulate the availability of top-down influences on attentional allocation, heterogeneous video clips were cut into clippets (M=2s), which were scrambled and re-assembled into MTV-style clips. Two groups of 8 Subjects each were instructed to "follow the main actors and actions". One group viewd the original stimuli while the other group viewd the MTV-style clips. Eye positions were recorded using an ISCAN eye-tracker (240Hz, yielding a total of more than a million samples for each group), and segmented into saccades, blinks, and fixation/smooth pursuit periods. A saliency-based model of attention capture (Itti & Koch 2000) was used to probe the relative contribution of bottom-up influences on attentional allocation based on a novel performance metric - Chance-Adjusted Saliency Accumometric (CASA). CASA values were computed based on the weighted sum of differences between normalized saliency at human vs. random saccade targets. Results: Total CASA based on the full saliency model was 6% higher in the MTV group compared to the original group. In both original and MTV groups, CASA based on either motion or flicker features alone was ~95% of the CASA based on the full saliency model. CASA based on either color, intensity, or orientation features alone was ~66% of the full model CASA. Generally, CASA values for earlier saccades after stimulus onset (clip or clippet start) were higher than for later saccades, but tapered off and flactuated around a fairly high value after the first several saccades. Conclusions: The 6% CASA difference between the original and MTV groups shows that eliminating visual context beyond the first ~2s of viewing barely increased the overall relative weight of bottom-up influences on attentional allocation. Our results imply that the relative weight of top-down influences on attentional allocation in dynamic scenes does not increase with viewing time (beyond the first ~2s). We also found that either motion or flicker are ~150% stronger than either color, intensity, or orientation as bottom-up attractors of attention.

Item Type:	Conference or Workshop Item (Poster)
Additional Information:	Copy of Poster will be included
Record Number:	CaltechJSNC:2004.poster002
Persistent URL:	https://resolver.caltech.edu/CaltechJSNC:2004.poster002
Usage Policy:	You are granted permission for individual, educational, research and non-commercial reproduction, distribution, display and performance of this work in any format
ID Code:	2
Collection:	CaltechCONF
Deposited By:	Imported from CaltechJSNC
Deposited On:	07 Jun 2004
Last Modified:	03 Oct 2019 22:49

Repository Staff Only: item control page