DIA-TTS: Deep-Inherited Attention Based Text-to-Speech Synthesizer

Yu, Junxiao; Xu, Zhengyuan; He, Xu; Wang, Jian; Liu, Bin; Feng, Rui; Song-sheng, Zhu; Wang, Wei; Li, Jianqing

doi:10.2139/ssrn.4257520

SSRN Journal

2022

DOI: 10.2139/ssrn.4257520

|View full text |Cite

DIA-TTS: Deep-Inherited Attention Based Text-to-Speech Synthesizer

Junxiao Yu

Zhengyuan Xu

Xu He

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Research on Speech Synthesis Based on Mixture Alignment Mechanism

Deng,

Wu,

Qiu

et al. 2023

Sensors

View full text Add to dashboard Cite

In recent years, deep learning-based speech synthesis has attracted a lot of attention from the machine learning and speech communities. In this paper, we propose Mixture-TTS, a non-autoregressive speech synthesis model based on mixture alignment mechanism. Mixture-TTS aims to optimize the alignment information between text sequences and mel-spectrogram. Mixture-TTS uses a linguistic encoder based on soft phoneme-level alignment and hard word-level alignment approaches, which explicitly extract word-level semantic information, and introduce pitch and energy predictors to optimally predict the rhythmic information of the audio. Specifically, Mixture-TTS introduces a post-net based on a five-layer 1D convolution network to optimize the reconfiguration capability of the mel-spectrogram. We connect the output of the decoder to the post-net through the residual network. The mel-spectrogram is converted into the final audio by the HiFi-GAN vocoder. We evaluate the performance of the Mixture-TTS on the AISHELL3 and LJSpeech datasets. Experimental results show that Mixture-TTS is somewhat better in alignment information between the text sequences and mel-spectrogram, and is able to achieve high-quality audio. The ablation studies demonstrate that the structure of Mixture-TTS is effective.

show abstract

Research on Speech Synthesis Based on Mixture Alignment Mechanism

Deng,

Wu,

Qiu

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

DIA-TTS: Deep-Inherited Attention Based Text-to-Speech Synthesizer

Cited by 1 publication

References 0 publications

Research on Speech Synthesis Based on Mixture Alignment Mechanism

Research on Speech Synthesis Based on Mixture Alignment Mechanism

Contact Info

Product

Resources

About