Mutian He scite author profile

Mutian He

5Publications

140Citation Statements Received

41Citation Statements Given

How they've been cited

172

139

How they cite others

Affiliations

Hong Kong University of Science and Technology, Nanjing Tech University, Beihang University

Publications

Order By: Most citations

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

He¹,

Deng

2019

View full text Add to dashboard Cite

Neural TTS has demonstrated strong capabilities to generate human-like speech with high quality and naturalness, while its generalization to out-of-domain texts is still a challenging task, with regard to the design of attention-based sequence-tosequence acoustic modeling. Various errors occur in those inputs with unseen context, including attention collapse, skipping, repeating, etc., which limits the broader applications. In this paper, we propose a novel stepwise monotonic attention method in sequence-to-sequence acoustic modeling to improve the robustness on out-of-domain inputs. The method utilizes the strict monotonic property in TTS with constraints on monotonic hard attention that the alignments between inputs and outputs sequence must be not only monotonic but allowing no skipping on inputs. Soft attention could be used to evade mismatch between training and inference. The experimental results show that the proposed method could achieve significant improvements in robustness on out-of-domain scenarios for phoneme-based models, without any regression on the in-domain naturalness test.

show abstract

Neural Subgraph Isomorphism Counting

Liu

Pan

et al. 2020

View full text Add to dashboard Cite

Time-evolving Text Classification with Deep Neural Networks

Song

et al. 2018

View full text Add to dashboard Cite

Traditional text classification algorithms are based on the assumption that data are independent and identically distributed. However, in most non-stationary scenarios, data may change smoothly due to long-term evolution and short-term fluctuation, which raises new challenges to traditional methods. In this paper, we present the first attempt to explore evolutionary neural network models for time-evolving text classification. We first introduce a simple way to extend arbitrary neural networks to evolutionary learning by using a temporal smoothness framework, and then propose a diachronic propagation framework to incorporate the historical impact into currently learned features through diachronic connections. Experiments on real-world news data demonstrate that our approaches greatly and consistently outperform traditional neural network models in both accuracy and stability.

show abstract

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

He¹,

Deng

2019

Preprint

View full text Add to dashboard Cite

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

He¹,

Yang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

End-to-end TTS suffers from high data requirements as it is difficult for both costly speech corpora to cover all necessary knowledge and neural models to learn the knowledge, hence additional knowledge needs to be injected manually. For example, to capture pronunciation knowledge on languages without regular orthography, a complicated grapheme-tophoneme pipeline needs to be built based on a structured, large pronunciation lexicon, leading to extra, sometimes high, costs to extend neural TTS to such languages. In this paper, we propose a framework to learn to extract knowledge from unstructured external resources using Token2Knowledge attention modules. The framework is applied to build a novel end-to-end TTS model named Neural Lexicon Reader that extracts pronunciations from raw lexicon texts. Experiments support the potential of our framework that the model significantly reduces pronunciation errors in low-resource, end-toend Chinese TTS, and the lexicon-reading capability can be transferred to other languages with a smaller amount of data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mutian He

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural Subgraph Isomorphism Counting

Time-evolving Text Classification with Deep Neural Networks

Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Contact Info

Product

Resources

About