PTEN augments SPARC suppression of proliferation and inhibits SPARC-induced migration by suppressing SHC-RAF-ERK and AKT signaling

We present a lightweight adaptable neural TTS system with high quality output. The system is composed of three separate neural network blocks: prosody prediction, acoustic feature prediction and Linear Prediction Coding Net as a neural vocoder. This system can synthesize speech with close to natural quality while running 3 times faster than real-time on a standard CPU.The modular setup of the system allows for simple adaptation to new voices with a small amount of data.We first demonstrate the ability of the system to produce high quality speech when trained on large, high quality datasets. Following that, we demonstrate its adaptability by mimicking unseen voices using 5 to 20 minutes long datasets with lower recording quality. Large scale Mean Opinion Score quality and similarity tests are presented, showing that the system can adapt to unseen voices with quality gap of 0.12 and similarity gap of 3% compared to natural speech for male voices and quality gap of 0.35 and similarity of gap of 9 % for female voices.

show abstract

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

Shechtman¹,

Sorin²

2019

View full text Add to dashboard Cite

Modern sequence to sequence neural TTS systems provide close to natural speech quality. Such systems usually comprise a network converting linguistic/phonetic features sequence to an acoustic features sequence, cascaded with a neural vocoder. The generated speech prosody (i.e. phoneme durations, pitch and loudness) is implicitly present in the acoustic features, being mixed with spectral information. Although the speech sounds natural, its prosody realization is randomly chosen and cannot be easily altered. The prosody control becomes an even more difficult task if no prosodic labeling is present in the training data. Recently, much progress has been achieved in unsupervised speaking style learning and generation, however human inspection is still required after the training for discovery and interpretation of the speaking styles learned by the system.In this work we introduce a fully automatic method that makes the system aware of the prosody and enables sentencewise speaking pace and expressiveness control on a continuous scale. While being useful by itself in many applications, the proposed prosody control can also improve the overall quality and expressiveness of the synthesized speech, as demonstrated by subjective listening evaluations. We also propose a novel augmented attention mechanism, that facilitates better pace control sensitivity and faster attention convergence. Index Terms: controllable speech synthesis, expressive text to speech, neural TTS, speech prosody, seq2seq models with attention

show abstract

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Chazan

Hoory

Sagi

et al.

View full text Add to dashboard Cite

Neural TTS Voice Conversion

Kons¹,

Shechtman²,

Sorin³

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Slava Shechtman

An autonomous debating system

High Quality, Lightweight and Adaptable TTS Using LPCNet

Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Neural TTS Voice Conversion

Contact Info

Product

Resources

About