The human hand moves in complex and highdimensional ways, making estimation of 3D hand pose configurations from images alone a challenging task. In this work we propose a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network. We derive an objective function from the variational lower bound of the VAE framework and jointly optimize the resulting cross-modal KLdivergence and the posterior reconstruction objective, naturally admitting a training regime that leads to a coherent latent space across multiple modalities such as RGB images, 2D keypoint detections or 3D hand configurations. Additionally, it grants a straightforward way of using semisupervision. This latent space can be directly used to estimate 3D hand poses from RGB images, outperforming the state-of-the art in different settings. Furthermore, we show that our proposed method can be used without changes on depth images and performs comparably to specialized methods. Finally, the model is fully generative and can synthesize consistent pairs of hand configurations across modalities. We evaluate our method on both RGB and depth datasets and analyze the latent space qualitatively.
Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks. Yet there is a need to lower gaze errors further to enable applications requiring higher quality. Further gains can be achieved by personalizing gaze networks, ideally with few calibration samples. However, over-parameterized neural networks are not amenable to learning from few examples as they can quickly over-fit. We embrace these challenges and propose a novel framework for Few-shot Adaptive GaZE Estimation (FAZE) for learning person-specific gaze networks with very few (≤ 9) calibration samples. FAZE learns a rotation-aware latent representation of gaze via a disentangling encoder-decoder architecture along with a highly adaptable gaze estimator trained using meta-learning. It is capable of adapting to any new person to yield significant performance gains with as few as 3 samples, yielding state-of-the-art performance of 3.18 • on GazeCapture, a 19% improvement over prior art.
PURPOSE Biomarkers on the basis of tumor-infiltrating lymphocytes (TIL) are potentially valuable in predicting the effectiveness of immune checkpoint inhibitors (ICI). However, clinical application remains challenging because of methodologic limitations and laborious process involved in spatial analysis of TIL distribution in whole-slide images (WSI). METHODS We have developed an artificial intelligence (AI)–powered WSI analyzer of TIL in the tumor microenvironment that can define three immune phenotypes (IPs): inflamed, immune-excluded, and immune-desert. These IPs were correlated with tumor response to ICI and survival in two independent cohorts of patients with advanced non–small-cell lung cancer (NSCLC). RESULTS Inflamed IP correlated with enrichment in local immune cytolytic activity, higher response rate, and prolonged progression-free survival compared with patients with immune-excluded or immune-desert phenotypes. At the WSI level, there was significant positive correlation between tumor proportion score (TPS) as determined by the AI model and control TPS analyzed by pathologists ( P < .001). Overall, 44.0% of tumors were inflamed, 37.1% were immune-excluded, and 18.9% were immune-desert. Incidence of inflamed IP in patients with programmed death ligand-1 TPS at < 1%, 1%-49%, and ≥ 50% was 31.7%, 42.5%, and 56.8%, respectively. Median progression-free survival and overall survival were, respectively, 4.1 months and 24.8 months with inflamed IP, 2.2 months and 14.0 months with immune-excluded IP, and 2.4 months and 10.6 months with immune-desert IP. CONCLUSION The AI-powered spatial analysis of TIL correlated with tumor response and progression-free survival of ICI in advanced NSCLC. This is potentially a supplementary biomarker to TPS as determined by a pathologist.
Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil-and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.