Kei Akuzawa scite author profile

Recent advances in neural autoregressive models have improve the performance of speech synthesis (SS). However, as they lack the ability to model global characteristics of speech (such as speaker individualities or speaking styles), particularly when these characteristics have not been labeled, making neural autoregressive SS systems more expressive is still an open issue. In this paper, we propose to combine VoiceLoop, an autoregressive SS model, with Variational Autoencoder (VAE). This approach, unlike traditional autoregressive SS systems, uses VAE to model the global characteristics explicitly, enabling the expressiveness of the synthesized speech to be controlled in an unsupervised manner. Experiments using the VCTK and Bliz-zard2012 datasets show the VAE helps VoiceLoop to generate higher quality speech and to control the expressions in its synthesized speech by incorporating global characteristics into the speech generating process.

show abstract

Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization

Akuzawa

Iwasawa

Matsuo

2020

View full text Add to dashboard Cite

Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

Akuzawa¹,

Iwasawa²,

Matsuo³

2018

Preprint

View full text Add to dashboard Cite

Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization

Akuzawa¹,

Iwasawa²,

Matsuo³

2019

Preprint

View full text Add to dashboard Cite

Learning domain-invariant representation is a dominant approach for domain generalization (DG), where we need to build a classifier that is robust toward domain shifts. However, previous domain-invariancebased methods overlooked the underlying dependency of classes on domains, which is responsible for the trade-off between classification accuracy and domain invariance. Because the primary purpose of DG is to classify unseen domains rather than the invariance itself, the improvement of the invariance can negatively affect DG performance under this trade-off. To overcome the problem, this study first expands the analysis of the tradeoff by Xie et. al. [33], and provides the notion of accuracy-constrained domain invariance, which means the maximum domain invariance within a range that does not interfere with accuracy. We then propose a novel method adversarial feature learning with accuracy constraint (AFLAC), which explicitly leads to that invariance on adversarial training. Empirical validations show that the performance of AFLAC is superior to that of domain-invariance-based methods on both synthetic and three real-world datasets, supporting the importance of considering the dependency and the efficacy of the proposed method.

show abstract

Contact motion in unknown environment

Morisawa

Tsuji

Nishioka

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kei Akuzawa

Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization

Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization

Contact motion in unknown environment

Contact Info

Product

Resources

About