Evaluating the performance of computer vision algorithms is classically done by reporting classification error or accuracy, if the problem at hand is the classification of an object in an image, the recognition of an activity in a video or the categorization and labeling of the image or video. If in addition the detection of an item in an image or a video, and/or its localization are required, frequently used metrics are Recall and Precision, as well as ROC curves. These metrics give quantitative performance values which are easy to understand and to interpret even by non-experts. However, an inherent problem is the dependency of quantitative performance measures on the quality constraints that we need impose on the detection algorithm. In particular, an important quality parameter of these measures is the spatial or spatio-temporal overlap between a ground-truth item and a detected item, and this needs to be taken into account when interpreting the results.We propose a new performance metric addressing and unifying the qualitative and quantitative aspects of the performance measures. The performance of a detection and recognition algorithm is illustrated intuitively by performance graphs which present quantitative performance values, like Recall, Precision and F-Score, depending on quality constraints of the detection. In order to compare the performance of different computer vision algorithms, a representative single performance measure is computed from the graphs, by integrating out all quality parameters. The evaluation method can be applied to different types of activity detection and recognition algorithms. The performance metric has been tested on several activity recognition algorithms participating in the ICPR 2012 HARL competition.
Today, with quality becoming increasingly important, each product requires three-dimensional in-line quality control. On the other hand, the 3D reconstruction of transparent objects is a very difficult problem in computer vision due to transparency and specularity of the surface. This paper proposes a new method, called Scanning From Heating (SFH), to determine the surface shape of transparent objects using laser surface heating and thermal imaging. Furthermore, the application to transparent glass is discussed and results on different surface shapes are presented.
For the past two decades, the need for three-dimensional (3-D) scanning of industrial objects has increased significantly and many experimental techniques and commercial solutions have been proposed. However, difficulties remain for the acquisition of optically non-cooperative surfaces, such as transparent or specular surfaces. To address highly reflective metallic surfaces, we propose the extension of a technique that was originally dedicated to glass objects. In contrast to conventional active triangulation techniques that measure the reflection of visible radiation, we measure the thermal emission of a surface, which is locally heated by a laser source. Considering the thermophysical properties of metals, we present a simulation model of heat exchanges that are induced by the process, helping to demonstrate its feasibility on specular metallic surfaces and predicting the settings of the system. With our experimental device, we have validated the theoretical modeling and computed some 3-D point clouds from specular surfaces of various geometries. Furthermore, a comparison of our results with those of a conventional system on specular and diffuse parts will highlight that the accuracy of the measurement no longer depends on the roughness of the surface. © 2012 Society of Photo-Optical Instrumentation Engineers (SPIE).
Abstract:We propose a new method for human pose estimation which leverages information from multiple views to impose a strong prior on articulated pose. The novelty of the method concerns the types of coherence modelled. Consistency is maximised over the different views through different terms modelling classical geometric information (coherence of the resulting poses) as well as appearance information which is modelled as latent variables in the global energy function. Moreover, adequacy of each view is assessed and their contributions are adjusted accordingly. Experiments on the HumanEva and UMPM datasets show that the proposed method significantly decreases the estimation error compared to single-view results.This paper is a preprint of a paper submitted to IET Computer Vision. If accepted, the copy of record will be available at the IET Digital Library. IntroductionHuman pose estimation is a building block in many industrial applications such as human-computer interaction, motion capture systems, etc. Whereas the problem has been almost solved for easy instances, such as cooperative settings in close distance and depth data without occlusions, other realistic configurations still present a significant challenge. In particular, pose estimation from RGB input in non-cooperative settings remains a difficult problem. Methods range from unstructured and purely discriminative approaches in simple tasks on depth data, which allow real-time performance on low-cost hardware, up to complex methods imposing strong priors on pose. The latter are dominant on the more difficult RGB data but also increasingly popular on depth. These priors are often modelled as kinematic trees (as in the proposed method) or, using inverse rendering as geometric parametric models (see section 2 for related works).In this paper, we leverage the information from multiple (RGB) views to impose a strong prior on articulated pose, targetting applications such as video surveillance from multiple cameras. Activity recognition in this context is frequently preceded by articulated pose estimation, which -in a non-cooperative environment such as surveillance -can strongly depend on the optimal viewpoint. Multi-view methods can often increase robustness w.r.t. occlusions.In the proposed approach, kinematic trees model independent pose priors for each individual viewpoint, and additional terms favour consistency across views. The novelty of our contribution lies in the fact that consistency is not only forced geometrically on the solution, but also in the space of latent variables across views.More precisely, a pose is modelled as mixtures of parts, each of which is assigned to a position. As in classical kinematic trees, deformation terms model relative positions of parts w.r.t. neighbours in the tree. In the lines of [1], the deformations and the appearance terms depend on latent variables which switch between mixture components. This creates a powerful and expressive model with low-variance mixture components which are able to model precise relation...
Abstract. Many practical tasks in industry, such as automatic inspection or robot vision, often require the scanning of three-dimensional shapes by use of non-contact techniques. However, few methods have been proposed to measure three-dimensional shapes of transparent objects because of the difficulty of dealing with transparency and specularity of the surface. This paper presents a 3D scanner for transparent glass objects based on Scanning From Heating (SFH), a new method that makes use of local surface heating and thermal imaging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.