Figure 1: Speech-to-gesture translation example. In this paper, we study the connection between conversational gesture and speech. Here, we show the result of our model that predicts gesture from audio. From the bottom upward: the input audio, arm and hand pose predicted by our model, and video frames synthesized from pose predictions using [10]. AbstractHuman speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from "in-the-wild" monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures.
The rise of cheaper and more accurate genotyping techniques has lead to significant advances in understanding the genotype-phenotype map. However, this is currently bottlenecked by manually intensive or slow phenotype data collection. We propose an algorithm to automatically estimate the canopy height of a row of plants in field conditions in a single pass on a moving robot. A stereo sensor pointed down collects a series of stereo image pairs. The depth images are then converted to height-aboveground images to extract height contours. Separate height contours corresponding to each frame are then concatenated to construct a height contour representing one row of plants in the plot. Since the process is automated, data can be collected throughout the growing season with very little manual labor complementing the already abundantly available genotypic data. Using experimental data from seven plots, we show our proposed approach achieves a height estimation error of approximately 3.3%.
We introduce HAWCgen, a set of deep generative neural network models, which are designed to supplement, or in some cases replace, parts of the simulation pipeline for the High Altitude Water Cherenkov (HAWC) observatory. We show that simple deep generative models replicate sampling of the reconstruction at a near arbitrary speedup compared to the current simulation. Furthermore, we show that generative models can offer a replacement to the detector simulation at a comparable rate and quality to current methods. This work was done as part of an undergraduate summer intern project at NVIDIA during the month of June, 2018.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.