Janine Thoma scite author profile

In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of highdimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate.In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks-Auto-Encoders (AE) and Generative Adversarial Networks (GAN).In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner 1 .

show abstract

Wasserstein Divergence for GANs

Huang

Thoma

et al. 2018

109

View full text Add to dashboard Cite

In many domains of computer vision, generative adversarial networks (GANs) have achieved great success, among which the family of Wasserstein GANs (WGANs) is considered to be state-of-the-art due to the theoretical contributions and competitive qualitative performance. However, it is very challenging to approximate the k-Lipschitz constraint required by the Wasserstein-1 metric (W-met). In this paper, we propose a novel Wasserstein divergence (W-div), which is a relaxed version of W-met and does not require the k-Lipschitz constraint. As a concrete application, we introduce a Wasserstein divergence objective for GANs (WGAN-div), which can faithfully approximate Wdiv through optimization. Under various settings, including progressive growing training, we demonstrate the stability of the proposed WGANdiv owing to its theoretical and practical advantages over WGANs. Also, we study the quantitative and visual performance of WGAN-div on standard image synthesis benchmarks, showing the superior performance of WGAN-div compared to the state-of-the-art methods.

show abstract

Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map

Thoma

Paudel²,

Chhatkuli

et al. 2019

View full text Add to dashboard Cite

Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based localization has the advantage of being inexpensive and efficient, often avoiding the use of 3D metric maps altogether. That said, the need to maintain a large number of reference images as an effective support of localization in a scene, nonetheless calls for them to be organized in a map structure of some kind.The problem of localization often arises as part of a navigation process. We are, therefore, interested in summarizing the reference images as a set of landmarks, which meet the requirements for image-based navigation. A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: map construction and self-localization. These requirements are then exploited for compact map representation and accurate selflocalization, using the framework of a network flow problem. During this process, we formulate the map construction and self-localization problems as convex quadratic and second-order cone programs, respectively. We evaluate our methods on publicly available indoor and outdoor datasets, where they outperform existing methods significantly 1 .

show abstract

[POSTER] Augmented Reality for User-Friendly Intra-Oral Scanning

Thoma

Havlena

Stalder

et al. 2017

View full text Add to dashboard Cite

Digital impressions of teeth, obtained through intra-oral scanning, allow for more efficient and cost effective treatments of many dental indications. Current state-of-the-art intra-oral impression acquisition systems make use of a separate monitor to show the scanning progress, forcing the dentist to divert attention away from the scanner and the patient. In this paper, we present an augmented reality based solution to this problem. During the scanning process, an optical see-through head-mounted display is used to show an online overlay of the dynamic dental model onto the patient's teeth. The dentist can then fully focus on the patient and the scanner, while still being able to keep track of the current state of the model. This type of novel application, which fundamentally changes the humancomputer interaction of intra-oral scanning systems, requires a fast and accurate registration of a dynamically growing model onto a glossy, partially occluded surface at a very small scale. To meet this demand, we propose application tailored algorithms for indirect high accuracy online 3D teeth tracking and optical see-through head-mounted display calibration. Experimental results indicate that our system does have a potential to noticeably facilitate intraoral scanning in the future.

show abstract

Geometrically Mappable Image Features

Thoma

Paudel

Chhatkuli

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Vision-based localization of an agent in a map is an important problem in robotics and computer vision. In that context, localization by learning matchable image features is gaining popularity due to recent advances in machine learning. Features that uniquely describe the visual contents of images have a wide range of applications, including image retrieval and understanding. In this work, we propose a method that learns image features targeted for image-retrieval-based localization. Retrieval-based localization has several benefits, such as easy maintenance and quick computation. However, the state-of-theart features only provide visual similarity scores which do not explicitly reveal the geometric distance between query and retrieved images. Knowing this distance is highly desirable for accurate localization, especially when the reference images are sparsely distributed in the scene. Therefore, we propose a novel loss function for learning image features which are both visually representative and geometrically relatable. This is achieved by guiding the learning process such that the feature and geometric distances between images are directly proportional. In our experiments we show that our features not only offer significantly better localization accuracy, but also allow to estimate the trajectory of a query sequence in absence of the reference images.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Janine Thoma

Sliced Wasserstein Generative Models

Wasserstein Divergence for GANs

Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map

[POSTER] Augmented Reality for User-Friendly Intra-Oral Scanning

Geometrically Mappable Image Features

Contact Info

Product

Resources

About