Minsu Jang scite author profile

Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.

show abstract

Speech gesture generation from the trimodal context of text, audio, and speaker identity

Yoon¹,

et al. 2020

View full text Add to dashboard Cite

For human-like agents, including virtual avatars and social robots, making proper gestures while speaking is crucial in human-agent interaction. Co-speech gestures enhance interaction experiences and make the agents look alive. However, it is difficult to generate human-like gestures due to the lack of understanding of how people gesture. Data-driven approaches attempt to learn gesticulation skills from human demonstrations, but the ambiguous and individual nature of gestures hinders learning. In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. By incorporating a multimodal context and an adversarial training scheme, the proposed model outputs gestures that are human-like and that match with speech content and rhythm. We also introduce a new quantitative evaluation metric for gesture generation models. Experiments with the introduced metric and subjective human evaluation showed that the proposed gesture generation model is better than existing end-to-end generation models. We further confirm that our model is able to work with synthesized audio in a scenario where contexts are constrained, and show that different gesture styles can be generated for the same speech by specifying different speaker identities in the style embedding space that is learned from videos of various speakers. All the code and data is available at https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context.

show abstract

Blue Electroluminescent Polymers: Control of Conjugation Length by Kink Linkages and Substituents in the Poly(p-phenylenevinylene)-Related Copolymers

Ahn¹,

Jang²,

Shim³

et al. 1999

Macromolecules

127

View full text Add to dashboard Cite

Poly[o(m,p)-phenylenevinylene-alt-2-methoxy-5-(2-ethylhexyloxy)-p-phenylenevinylene], o(m,p)-PMEH-PPV, and poly[o(m,p)-phenylenevinylene-alt-2,5-bis(trimethylsilyl)-p-phenylenevinylene], o(m,p)-PBTMS-PPV, of varying effective conjugation lengths were synthesized by the well-known Wittig condensation polymerization between the appropriate diphosphonium salts and the dialdehyde monomers such as terephthaldicarboxaldehyde, isophthalaldehyde, and phthalicdicarboxaldehyde. The conjugation lengths of the polymers were controlled by substituents and kink (ortho and meta) linkages. The resulting polymers were highly soluble in common organic solvents. The synthesized polymers showed UV−visible absorbance and photoluminescence (PL) in the ranges of 330−430 nm and 440−550 nm, respectively. The maximum emission peak of p-PMEH-PPV was blueshifted about 30 nm compared to that of MEH-PPV, which is due to an unsubstituted phenylene unit. In addition, o-PMEH-PPV and m-PMEH-PPV showed PL emission maximum peaks at 500 and 490 nm, respectively, because ortho and meta linkage of the o(m)-PMEH-PPV reduced π-conjugation lengths. The trimethylsilyl substituent has no electrondonating effect; therefore, the PL maximum of p-PBTMS-PPV was far more blueshifted (to 485 nm). Consequently, maximum PL wavelengths for o-PBTMS-PPV and m-PBTMS-PPV appeared around 470 and 440 nm, respectively. A single-layer light-emitting diode device was fabricated which has a simple ITO (indium−tin oxide)/polymer/Al configuration. The threshold bias of trimethylsilyl-substituted o(m,p)-PBTMS-PPV was in the range of 8−9 V. As in the photoluminescence spectra, the dramatic change of emission color was also shown in electroluminescence spectra of p-PMEH-PPV, p-PBTMS-PPV, and o-PBTMS-PPV when the operating voltage was about 8−9 V.

show abstract

ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly

Jang

Kim

Park

et al. 2020

View full text Add to dashboard Cite

Highly Efficient Light-Emitting Polymers Composed of Both Hole and Electron Affinity Units in the Conjugated Main Chain

Song¹,

Jang²,

Shim³

et al. 1999

Macromolecules

View full text Add to dashboard Cite

Two new fully conjugated alternating copolymers containing both carbazole and oxadiazole units were prepared through the Wittig condensation polymerization (carbazole units were linked with oxadiazole units by meta and para). The polymers with the para linkage (PPOX−CAR) and the meta linkage (PMOX−CAR) in the main chain were soluble in common organic solvents and thermally stable on heating (the weight loss was less than 5% on heating to about 400 °C under nitrogen atmosphere). The maximum photoluminescence and the electroluminescence wavelengths of PPOX−CAR and PMOX−CAR were varied from 495 nm in the greenish-blue emission region to 450 nm in the blue emission region depending on the kink structure. The turn-on voltages of PPOX-CAR and PMOX-CAR were 7.5 and 10.5 V, respectively, when the single-layer light-emitting diodes of Al/PPOX-CAR or PMOX-CAR/ITO glass were fabricated. The maximum brightness of the Al/PPOX-CAR/ITO single-layer device was 500 cd/m2 at 20 V.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Minsu Jang

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

Speech gesture generation from the trimodal context of text, audio, and speaker identity

Blue Electroluminescent Polymers: Control of Conjugation Length by Kink Linkages and Substituents in the Poly(p-phenylenevinylene)-Related Copolymers

ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly

Highly Efficient Light-Emitting Polymers Composed of Both Hole and Electron Affinity Units in the Conjugated Main Chain

Contact Info

Product

Resources

About