Classical semantic segmentation methods, including the recent deep learning ones, assume that all classes observed at test time have been seen during training. In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time. The main trends in this area either leverage the notion of prediction uncertainty to flag the regions with low confidence as unknown, or rely on autoencoders and highlight poorly-decoded regions. Having observed that, in both cases, the detected regions typically do not correspond to unexpected objects, in this paper, we introduce a drastically different strategy: It relies on the intuition that the network will produce spurious labels in regions depicting unexpected objects. Therefore, resynthesizing the image from the resulting semantic map will yield significant appearance differences with respect to the input image. In other words, we translate the problem of detecting unknown classes to one of identifying poorlyresynthesized image regions. We show that this outperforms both uncertainty-and autoencoder-based methods.
Conventiona approaches to image de-fencing use multiple adjacent frames for segmentation of fences in the reference image and are limited to restoring images of static scenes only. In this paper, we propose a de-fencing algorithm for images of dynamic scenes using an occlusionaware optical flow method. We divide the problem of image de-fencing into the tasks of automated fence segmentation from a single image, motion estimation under known occlusions and fusion of data from multiple frames of a captured video of the scene. Specifically, we use a pre-trained convolutional neural network to segment fence pixels from a single image. The knowledge of spatial locations of fences is used to subsequently estimate optical flow in the occluded frames of the video for the final data fusion step. We cast the fence removal problem in an optimization framework by modeling the formation of the degraded observations. The inverse problem is solved using fast iterative shrinkage thresholding algorithm (FISTA). Experimental results show the effectiveness of proposed algorithm.
In recent times, the availability of inexpensive image capturing devices such as smartphones/tablets has led to an exponential increase in the number of images/videos captured. However, sometimes the amateur photographer is hindered by fences in the scene which have to be removed after the image has been captured. Conventional approaches to image de-fencing suffer from inaccurate and non-robust fence detection apart from being limited to processing images of only static occluded scenes. In this paper, we propose a semi-automated de-fencing algorithm using a video of the dynamic scene. We use convolutional neural networks for detecting fence pixels. We provide qualitative as well as quantitative comparison results with existing lattice detection algorithms on the existing PSU NRT data set [1] and a proposed challenging fenced image dataset. The inverse problem of fence removal is solved using split Bregman technique assuming total variation of the de-fenced image as the regularization constraint.
The advent of inexpensive smartphones/tablets/phablets equipped with cameras has resulted in the average person capturing cherished moments as images/videos and sharing them on the internet. However, at several locations, an amateur photographer may be frustrated with the captured images. For example, the object of interest to the photographer might be occluded or fenced. Currently available image de-fencing methods in the literature are limited by non-robust fence detection and can handle only static occluded scenes whose video is captured by constrained camera motion. In this work, we propose an algorithm to obtain a de-fenced image using a few frames from a video of the occluded static or dynamic scene. We also present a new fenced image database captured under challenging scenarios such as clutter, poor lighting, viewpoint distortion, etc. Initially, we propose a supervised learning-based approach to detect fence pixels and validate its performance with qualitative as well as quantitative results. We rely on the idea that freehand panning of the fenced scene is likely to render visible hidden pixels of the reference frame in other frames of the captured video. Our approach necessitates the solution of three problems: (i) detection of spatial locations of fences/occlusions in the frames of the video, (ii) estimation of relative motion between the observations, and (iii) data fusion to fill in occluded pixels in the reference image. We assume the de-fenced image as a Markov random field and obtain its maximum a posteriori estimate by solving the corresponding inverse problem. Several experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.