Recently, vision-based Advanced Driver Assist Systems have gained broad interest. In this work, we investigate free-space detection, for which we propose to employ a Fully Convolutional Network (FCN). We show that this FCN can be trained in a selfsupervised manner and achieve similar results compared to training on manually annotated data, thereby reducing the need for large manually annotated training sets. To this end, our selfsupervised training relies on a stereo-vision disparity system, to automatically generate (weak) training labels for the color-based FCN. Additionally, our self-supervised training facilitates online training of the FCN instead of offline. Consequently, given that the applied FCN is relatively small, the free-space analysis becomes highly adaptive to any traffic scene that the vehicle encounters. We have validated our algorithm using publicly available data and on a new challenging benchmark dataset that is released with this paper. Experiments show that the online training boosts performance with 5% when compared to offline training, both for F max and AP.
published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User
Smart surveillance systems become more meaningful if they both grow in reliability and robustness, while simultaneously offering a higher semantic level of understanding. To achieve a higher level of semantic scene understanding, the objects and their actions have to be interpreted in the given context, so that the extraction of contextual information is required. This chapter explores several techniques for extracting the contextual information such as spatial, motion, depth and co-occurrence, depending on applications. Afterwards, the chapter provides specific case studies to evaluate the usefulness of context information, based on: (1) region labeling of the surroundings of objects, (2) motion analysis of the water for moving ships, (3) traffic sign recognition for safety event evaluation and (4) the use of depth signals for obstacle detection. The chapter shows that the previous cases can be solved in an improved way with respect to robustness and semantic understanding. Case studies indicate up to 6.8% improvement of reliable correct object understanding and the novel possibility of labeling scene events as safe/unsafe depending on the object behavior and the detected surrounding context. In this chapter, it is shown that using contextual information improves automated video surveillance analysis, as it not only improves the reliability of moving object detection, but also enables scene understanding that is far beyond object understanding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.