Abstract-In this work we propose the use of a modified version of the correlation coefficient as a performance criterion for the image alignment problem. The proposed modification has the desirable characteristic of being invariant with respect to photometric distortions. Since the resulting similarity measure is a nonlinear function of the warp parameters, we develop two iterative schemes for its maximization, one based on the forward additive approach and the second on the inverse compositional method. As it is customary in iterative optimization, in each iteration the nonlinear objective function is approximated by an alternative expression for which the corresponding optimization is simple. In our case we propose an efficient approximation that leads to a closed form solution (per iteration) which is of low computational complexity, the latter property being particularly strong in our inverse version. The proposed schemes are tested against the Forward Additive Lucas-Kanade and the Simultaneous Inverse Compositional algorithm through simulations. Under noisy conditions and photometric distortions our forward version achieves more accurate alignments and exhibits faster convergence whereas our inverse version has similar performance as the Simultaneous Inverse Compositional algorithm but at a lower computational complexity.
International audienceRecent advances on human motion analysis have made the extraction of human skeleton structure feasible, even from single depth images. This structure has been proven quite informative for discriminating actions in a recognition scenario. In this context, we propose a local skeleton descriptor that encodes the relative position of joint quadruples. Such a coding implies a similarity normalisation transform that leads to a compact (6D) view-invariant skeletal feature, referred to as skeletal quad. Further, the use of a Fisher kernel representation is suggested to describe the skeletal quads contained in a (sub)action. A Gaussian mixture model is learnt from training data, so that the generation of any set of quads is encoded by its Fisher vector. Finally, a multi-level representation of Fisher vectors leads to an action description that roughly carries the order of sub-action within each action sequence. Efficient classification is here achieved by linear SVMs. The proposed action representation is tested on widely used datasets, MSRAction3D and HDM05. The experimental evaluation shows that the proposed method outperforms state-of-the-art algorithms that rely only on joints, while it competes with methods that combine joints with extra cues
Time-of-flight (TOF) cameras are sensors that can measure the depths of scene-points, by illuminating the scene with a controlled laser or LED source, and then analyzing the reflected light. In this paper we will first describe the underlying measurement principles of time-of-flight cameras, including: (i) pulsedlight cameras, which measure directly the time taken for a light pulse to travel from the device to the object and back again, and (ii) continuous-wave modulatedlight cameras, which measure the phase difference between the emitted and received signals, and hence obtain the travel time indirectly. We review the main existing designs, including prototypes as well as commercially available devices. We also review the relevant camera calibration principles, and how they are applied to TOF devices. Finally, we discuss the benefits and challenges of combined TOF and color camera systems.
International audienceThis paper describes a probabilistic generative model and its associated algorithm to jointly register multiple point sets. The vast majority of state-of-the-art registration techniques select one of the sets as the ''model" and perform pairwise alignments between the other sets and this set. The main drawback of this mode of operation is that there is no guarantee that the model-set is free of noise and outliers, which contaminates the estimation of the registration parameters. Unlike previous work, the proposed method treats all the point sets on an equal footing: they are realizations of a Gaussian mixture (GMM) and the registration is cast into a clustering problem. We formally derive an EM algorithm that estimates both the GMM parameters and the rotations and translations that map each individual set onto the ''central" model. The mixture means play the role of the registered set of points while the variances provide rich information about the quality of the registration. We thoroughly validate the proposed method with challenging datasets, we compare it with several state-of-the-art methods, and we show its potential for fusing real depth data
This paper addresses the problem of registering multiple point sets. Solutions to this problem are often approximated by repeatedly solving for pairwise registration, which results in an uneven treatment of the sets forming a pair: a model set and a data set. The main drawback of this strategy is that the model set may contain noise and outliers, which negatively affects the estimation of the registration parameters. In contrast, the proposed formulation treats all the point sets on an equal footing. Indeed, all the points are drawn from a central Gaussian mixture, hence the registration is cast into a clustering problem. We formally derive batch and incremental EM algorithms that robustly estimate both the GMM parameters and the rotations and translations that optimally align the sets. Moreover, the mixture's means play the role of the registered set of points while the variances provide rich information about the contribution of each component to the alignment. We thoroughly test the proposed algorithms on simulated data and on challenging real data collected with range sensors. We compare them with several state-of-the-art algorithms, and we show their potential for surface reconstruction from depth data.
Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.
The invariance of the similarity measure in photometric distortions as well as its capability in producing subpixel accuracy are two desired and often required features in most stereo vision applications. In this paper we propose a new correlation-based measure which incorporates both mentioned requirements. Specifically, by using an appropriate interpolation scheme in the candidate windows of the matching image, and using the classical zero mean normalized cross correlation function, we introduce a suitable measure. Although the proposed measure is a nonlinear function of the sub-pixel displacement parameter, its maximization results in a closed form solution, resulting in reduced complexity for its use in matching techniques. Application of the proposed measure in a number of benchmark stereo pair images reveals its superiority over existing correlation-based techniques used for sub-pixel accuracy.
This study scrutinizes the existing literature regarding the use of augmented reality and gamification in education to establish its theoretical basis. A systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was conducted. To provide complete and valid information, all types of related studies for all educational stages and subjects throughout the years were investigated. In total, 670 articles from 5 databases (Scopus, Web of Science, Google Scholar, IEEE, and ERIC) were examined. Based on the results, using augmented reality and gamification in education can yield several benefits for students, assist educators, improve the educational process, and facilitate the transition toward technology-enhanced learning when used in a student-centered manner, following proper educational approaches and strategies and taking students’ knowledge, interests, unique characteristics, and personality traits into consideration. Students demonstrated positive behavioral, attitudinal, and psychological changes and increased engagement, motivation, active participation, knowledge acquisition, focus, curiosity, interest, enjoyment, academic performance, and learning outcomes. Teachers also assessed them positively. Virtual rewards were crucial for improving learning motivation. The need to develop appropriate validation tools, design techniques, and theories was apparent. Finally, their potential to create collaborative and personalized learning experiences and to promote and enhance students’ cognitive and social–emotional development was evident.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.