Abstract-A crucial feature of a good scene recognition algorithm is its ability to generalize. Scene categories, especially those related to human made indoor places or to human activities like sports, do present a high degree of intra-class variability, which in turn requires high robustness and generalization properties. Such features are amongst the distinctive characteristics of the Naive Bayes Nearest Neighbor (NBNN) approach [1], an image classification framework that since its introduction in 2008 has been gaining momentum in the visual recognition community. In this paper we show how with a straightforward modification of the original NBNN scoring function it is possible to use a recently introduced latent locally linear SVM algorithm to discriminatively learn a set of prototype local features for each class. The resulting classification algorithm, that we call Naive Bayes Nonlinear Learning (NBNL) preserves the generality and robustness properties of the original approach, while greatly reducing its memory requirements during testing, and significantly improving its performance. To the best of our knowledge this is the first work to exploit the structure of the local features through the use of a latent locally linear discriminative learning method. Experiments over three different public scene recognition datasets show the effectiveness of the proposed algorithm, which outperforms several existing NBNN-based methods and is competitive with standard Bag-of-Words plus SVM approaches.
Indoor scenes are characterized by a high intra-class variability, mainly due to the intrinsic variety of the objects in them, and to the drastic image variations due to (even small) view-point changes. One of the main trends in the literature has been to employ representations coupling statistical characterizations of the image, with a description of their spatial distribution. This is usually done by combining multiple representations of different image regions, most often using a fixed 4 × 4, or pyramidal image-partitioning scheme. While these encodings are able to capture the spatial regularities of the problem, they are unsuitable to handle its spatial variabilities. In this work we propose to complement a traditional spatial-encoding scheme with a bottom-up approach designed to discover visual-structures regardless of their exact position in the scene. To this end we use saliency maps to segment each image in two regions: the most and least salient 50%. This segmentation provides a description of images which is somehow related to the relative semantics of the discovered regions, complementing the canonical spatial description. We evaluated the proposed technique on three public scene recognition datasets. Our results prove this approach to be effective in the indoor scenario, while being also meaningful for other scene categorization tasks.
Abstract-This article describes the Robot Vision challenge, a competition that evaluates solutions for the visual place classification problem. Since its origin, this challenge has been proposed as a common benchmark where worldwide proposals are measured using a common overall score. Each new edition of the competition introduced novelties, both for the type of input data and sub-objectives of the challenge. All the techniques used by the participants have been gathered up and published to make it accessible for future developments. The legacy of the Robot Vision challenge includes datasets, benchmarking techniques, and a wide experience in the place classification research that is reflected in this article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.