Alignment of objects is a predominant problem in part-based methods for visual object categorisation (VOC). These methods should learn the parts and their spatial variation, which is difficult for objects in arbitrary poses. A straightforward solution is to annotate images with a set of "object landmarks", but due to laborious manual annotation, semi-supervised methods requiring only a set of images and class labels are preferred. Recent state-of-the-art VOC methods utilise various approaches to align objects or otherwise compensate their geometric variation, but no explicit solution to the alignment problem with quantitative results can be found.The problem has been studied in the recent works related to "image congealing". The congealing methods, however, are based on image-based processing, and thus require moderate initial alignment and are sensitive to intra-class variation and background clutter. In this work, we define a local feature based algorithm to rigidly align object class images. Our algorithm is based on the standard VOC tools: local feature detectors and descriptors, correspondence based homography estimation, and random sample consensus (RANSAC) based spatial validation of local features. We first demonstrate how an intuitive feature matching approach works for simple classes, but fails for more complex ones. This is solved by a spatial scoring procedure which is the core element in the proposed method. Our method is compared to a state-of-the-art congealing method with realistic and difficult Caltech-101 and randomised Caltech-101 (r-Caltech-101) categories for which our method achieves clearly superior performance.
Visual object categorization is one of the most active research topics in computer vision, and Caltech-101 data set is one of the standard benchmarks for evaluating the method performance. Despite of its wide use, the data set has certain weaknesses: i) the objects are practically in a standard pose and scale in the middle of the images and ii) background varies too little in certain categories making it more discriminative than the foreground objects. In this work, we demonstrate how these weaknesses bias the evaluation results in an undesired manner. In addition, we reduce the bias effect by replacing the backgrounds with random landscape images from Google and by applying random Euclidean transformations to the foreground objects. We demonstrate how the proposed randomization process makes visual object categorization more challenging improving the relative results of methods which categorize objects by their visual appearance and are invariant to pose changes. The new data set is made publicly available for other researchers.
Colour is an important cue in many applications of computer vision and image processing, but robust usage often requires estimation of the unknown illuminant colour. Usually, to obtain images invariant to the illumination conditions under which they were taken, color normalisation is used. In this work, we develop a such colour normalisation technique, where true colours are not important per se but where examples of same classes have photometrically consistent appearance. This is achieved by supervised estimation of a class specific canonical colour space where the examples have minimal variation in their colours. We demonstrate the effectiveness of our method with qualitative and quantitative examples from the Caltech-101 data set and a real application of 3D pose estimation for robot grasping.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.