Object detection in videos has drawn increasing attention since it is more practical in real scenarios. Most of the deep learning methods use CNNs to process each decoded frame in a video stream individually. However, the free of charge yet valuable motion information already embedded in the video compression format is usually overlooked. In this paper, we propose a fast object detection method by taking advantage of this with a novel Motion aided Memory Network (MMNet). The MMNet has two major advantages: 1) It significantly accelerates the procedure of feature extraction for compressed videos. It only need to run a complete recognition network for I-frames, i.e. a few reference frames in a video, and it produces the features for the following P frames (predictive frames) with a light weight memory network, which runs fast; 2) Unlike existing methods that establish an additional network to model motion of frames, we take full advantage of both motion vectors and residual errors that are freely available in video streams. To our best knowledge, the MMNet is the first work that investigates a deep convolutional detector on compressed videos. Our method is evaluated on the large-scale Ima-geNet VID dataset, and the results show that it is 3× times faster than single image detector R-FCN and 10× times faster than high-performance detector MANet at a minor accuracy loss.
Despite great progress made in the few-shot semantic segmentation task, the existing works still suffer from problems of incompleteness and inconsistency of segmentation. In this paper, a novel attention-aided LSTM optimization network called LONet is proposed, which optimizes predictions without forgetting useful inner cues. Particularly, we calculate an attention map to align and match possible locations with query features to deal with incomplete segmentation. Then, an LSTM-based module is designed to overcome the segmentation inconsistency by memorizing and updating useful cues iteratively. Extensive experiments are conducted on two popular few-shot segmentation datasets including PASCAL-5 i and FSS-1000. The experimental results on the FSS-1000 dataset demonstrate that our LONet exceeds the state-of-theart results by 2.1% and 2.3%, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.