Entrainment of visual steady-state responses is modulated by global spatial statistics

Figure 1: Speech-to-gesture translation example. In this paper, we study the connection between conversational gesture and speech. Here, we show the result of our model that predicts gesture from audio. From the bottom upward: the input audio, arm and hand pose predicted by our model, and video frames synthesized from pose predictions using [10]. AbstractHuman speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from "in-the-wild" monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures.

show abstract

Everybody Dance Now

Chan¹,

Ginosar²,

Zhou³

et al. 2018

Preprint

103

View full text Add to dashboard Cite

Improving accessibility of the web with a computer game

et al. 2006

View full text Add to dashboard Cite

Images on the Web present a major accessibility issue for the visually impaired, mainly because the majority of them do not have proper captions. This paper addresses the problem of attaching proper explanatory text descriptions to arbitrary images on the Web. To this end, we introduce Phetch, an enjoyable computer game that collects explanatory descriptions of images. People play the game because it is fun, and as a side effect of game play we collect valuable information. Given any image from the World Wide Web, Phetch can output a correct annotation for it. The collected data can be applied towards significantly improving Web accessibility. In addition to improving accessibility, Phetch is an example of a new class of games that provide entertainment in exchange for human processing power. In essence, we solve a typical computer vision problem with HCI tools alone.

show abstract

A Century of Portraits: A Visual Historical Record of American High School Yearbooks

Ginosar

Rakelly

Sachs

et al. 2017

IEEE Trans. Comput. Imaging

View full text Add to dashboard Cite

Many details about our world are not captured in written records because they are too mundane or too abstract to describe in words. Fortunately, since the invention of the camera, an ever-increasing number of photographs capture much of this otherwise lost information. This plethora of artifacts documenting our "visual culture" is a treasure trove of knowledge as yet untapped by historians. We present a dataset of 37,921 frontal-facing American high school yearbook photos that allow us to use computation to glimpse into the historical visual record too voluminous to be evaluated manually. The collected portraits provide a constant visual frame of reference with varying content. We can therefore use them to consider issues such as a decade's defining style elements, or trends in fashion and social norms over time. We demonstrate that our historical image dataset may be used together with weakly-supervised datadriven techniques to perform scalable historical analysis of large image corpora with minimal human effort, much in the same way that large text corpora together with natural language processing revolutionized historians' workflow. Furthermore, we demonstrate the use of our dataset in dating grayscale portraits using deep learning methods.

show abstract

Detecting People in Cubist Art

Ginosar

Haas

Brown

et al. 2015

View full text Add to dashboard Cite

Abstract. Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes. To determine whether algorithms truly mirror the flexibility of human vision, they must be compared against human vision at its limits. For example, in Cubist abstract art, painted objects are distorted by object fragmentation and part-reorganization, to the point that human vision often fails to recognize them. In this paper, we evaluate existing object detection methods on these abstract renditions of objects, comparing human annotators to four state-of-the-art object detectors on a corpus of Picasso paintings. Our results demonstrate that while human perception significantly outperforms current methods, human perception and part-based models exhibit a similarly graceful degradation in object detection performance as the objects become increasingly abstract and fragmented, corroborating the theory of part-based object representation in the brain.

show abstract

Improving Image Search with PHETCH

Ahn

Ginosar

Kedia

et al. 2007

View full text Add to dashboard Cite

Online image search engines are hindered by the lack of proper labels for images in their indices. In many cases the labels do not agree with the contents of the image itself, since images are generally indexed by their filename and the surrounding text in a webpage. To overcome this problem we present Phetch, a system for attaching accurate explanatory text captions to arbitrary images on the Web. Phetch is an engaging multiplayer game that entices people to write accurate captions. People play the game because it is fun, and as a side effect we collect valuable information that can be applied towards improving image search engines. In addition, the game can also be used to enhance Web accessibility and to provide other novel applications.

show abstract

Learning to Factorize and Relight a City

Liu

Ginosar

Zhou³

et al. 2020

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shiry Ginosar

Everybody Dance Now

Learning Individual Styles of Conversational Gesture

Everybody Dance Now

Improving accessibility of the web with a computer game

A Century of Portraits: A Visual Historical Record of American High School Yearbooks

Detecting People in Cubist Art

Improving Image Search with PHETCH

Learning to Factorize and Relight a City

Contact Info

Product

Resources

About