Speaker: Aysylu Gabdulkhakova
The high resolution and comprehensiveness of images and videos make its processing a challenging task from time costs perspective. For completing a particular task the only needed information relates to the object of interest. Visual attention is a concept adopted from biology, that explains selection mechanisms of vision. We propose to use top-down spatio-temporal visual attention for the purpose of predicting the next positions of the target balls instead of tracking them in every frame. The idea includes two ingredients. First, functionality of the interesting objects is used to predict their next state in upcoming time slots. Second, the spatio-temporal visual attention component emphasizes semantically substantial parts of the scene and milestone time frames. Together prediction and concentration enable efficient reasonable analysis of the video without losing meaningful information. The utility of the introduced approach is demonstrated on the example of tracking a snooker game footage.