Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning with a J &F measure of 71.5% on the DAVIS 2017 validation set. We make our code and models available at https://github.com/tensorflow/ models/tree/master/research/feelvos. * Work done during an internship at Google Inc.† Now at Waymo LLC. Simple Fast End-to-end Strong PML [6] OSMN [40] FAVOS [7] VideoMatch [17] RGMP [37] FEELVOS (ours) PReMVOS [26] OnAVOS [35]
Predicting human behavior is a difficult and crucial task required for motion planning. It is challenging in large part due to the highly uncertain and multimodal set of possible outcomes in real-world domains such as autonomous driving. Beyond single MAP trajectory prediction [1,2], obtaining an accurate probability distribution of the future is an area of active interest [3,4]. We present MultiPath, which leverages a fixed set of future state-sequence anchors that correspond to modes of the trajectory distribution. At inference, our model predicts a discrete distribution over the anchors and, for each anchor, regresses offsets from anchor waypoints along with uncertainties, yielding a Gaussian mixture at each time step. Our model is efficient, requiring only one forward inference pass to obtain multi-modal future distributions, and the output is parametric, allowing compact communication and analytical probabilistic queries. We show on several datasets that our model achieves more accurate predictions, and compared to sampling baselines, does so with an order of magnitude fewer trajectories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.