Jie Wu scite author profile

Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long shortterm memory (DBLSTM) for emotional voice conversion. A continuous wavelet transform (CWT) representation of fundamental frequency (F0) and energy contour are used for prosody modeling. Specifically, we use CWT to decompose F0 into a five-scale representation, and decompose energy contour into a ten-scale representation, where each feature scale corresponds to a temporal scale. Both spectrum and prosody (F0 and energy contour) features are simultaneously converted by a sequence to sequence conversion method with DBLSTM model, which captures both frame-wise and long-range relationship between source and target voice. The converted speech signals are evaluated both objectively and subjectively, which confirms the effectiveness of the proposed method.

show abstract

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

Lu¹,

Wu²,

Luan

et al. 2020

View full text Add to dashboard Cite

This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we add a residual connection in F0 prediction. 3) In addition to the duration loss of each phoneme, the duration of all the phonemes in a musical note is accumulated to calculate the syllable duration loss for rhythm enhancement. Experiment results show that XiaoiceSing outperforms the baseline system of convolutional neural networks by 1.44 MOS on sound quality, 1.18 on pronunciation accuracy and 1.38 on naturalness respectively. In two A/B tests, the proposed F0 and duration modeling methods achieve 97.3% and 84.3% preference rate over baseline respectively, which demonstrates the overwhelming advantages of XiaoiceSing.

show abstract

A Review of Augmented Reality in Robotic-Assisted Surgery

Qian

DiMaio

et al. 2020

IEEE Trans. Med. Robot. Bionics

View full text Add to dashboard Cite

Fracture Detection in Traumatic Pelvic CT Images

Davuluri

Ward

et al. 2012

International Journal of Biomedical Imaging

View full text Add to dashboard Cite

Fracture detection in pelvic bones is vital for patient diagnostic decisions and treatment planning in traumatic pelvic injuries. Manual detection of bone fracture from computed tomography (CT) images is very challenging due to low resolution of the images and the complex pelvic structures. Automated fracture detection from segmented bones can significantly help physicians analyze pelvic CT images and detect the severity of injuries in a very short period. This paper presents an automated hierarchical algorithm for bone fracture detection in pelvic CT scans using adaptive windowing, boundary tracing, and wavelet transform while incorporating anatomical information. Fracture detection is performed on the basis of the results of prior pelvic bone segmentation via our registered active shape model (RASM). The results are promising and show that the method is capable of detecting fractures accurately.

show abstract

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

Lu¹,

Wu²,

Luan

et al. 2020

Preprint

View full text Add to dashboard Cite

SAR Target Configuration Recognition via Two-Stage Sparse Structure Representation

Liu

Chen

et al. 2018

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

Adversarially Trained Multi-Singer Sequence-to-Sequence Singing Synthesizer

Wu¹,

Luan²

2020

View full text Add to dashboard Cite

This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings. Based on the sequence-to-sequence singing model, we design a multisinger framework to leverage all the existing singing data of different singers. To attenuate the issue of musical score unbalance among singers, we incorporate an adversarial task of singer classification to make encoder output less singer dependent. Furthermore, we apply multiple random window discriminators (MRWDs) on the generated acoustic features to make the network be a GAN. Both objective and subjective evaluations indicate that the proposed synthesizer can generate higher quality singing voice than baseline (4.12 vs 3.53 in MOS). Especially, the articulation of high-pitched vowels is significantly enhanced.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jie Wu

Assessment of a green credit policy aimed at energy-intensive industries in China based on a financial CGE model

Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

A Review of Augmented Reality in Robotic-Assisted Surgery

Fracture Detection in Traumatic Pelvic CT Images

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

SAR Target Configuration Recognition via Two-Stage Sparse Structure Representation

Adversarially Trained Multi-Singer Sequence-to-Sequence Singing Synthesizer

Contact Info

Product

Resources

About