Julian Zaïdi scite author profile

Julian Zaïdi

5Publications

39Citation Statements Received

79Citation Statements Given

How they've been cited

How they cite others

Affiliations

Ubisoft (Canada)

Publications

Order By: Most citations

Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

Zaïdi¹,

Seuté²,

Niekerk³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art on inter-speaker and inter-text prosody transfer. This improvement is achieved using FiLM conditioning layers, alongside adversarial training that encourages disentanglement between prosodic information and speaker identity. The acoustic model inherits attractive qualities from FastSpeech 2, such as fast inference and local prosody attributes prediction for finer grained control over generation. Experimental results show that Daft-Exprt significantly outperforms strong baselines on prosody transfer tasks, while yielding naturalness comparable to stateof-the-art expressive models. Moreover, results indicate that adversarial training effectively discards speaker identity information from the prosody representation, which ensures Daft-Exprt will consistently generate speech with the desired voice. We publicly release our code 1 and provide speech samples from our experiments 2 .

show abstract

Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

Zaïdi¹,

Seuté²,

Niekerk³

et al. 2022

View full text Add to dashboard Cite

Measuring Disentanglement: A Review of Metrics

Zaïdi¹,

Boilard²,

Gagnon

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

Niekerk

Carbonneau

Zaïdi

et al. 2022

View full text Add to dashboard Cite

Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments. Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody.

show abstract

Measuring Disentanglement: A Review of Metrics

Carbonneau¹,

Zaïdi²,

Boilard³

et al. 2020

Preprint

View full text Add to dashboard Cite

Learning to disentangle and represent factors of variation in data is an important problem in AI. While many advances are made to learn these representations, it is still unclear how to quantify disentanglement. Several metrics exist, however little is known on their implicit assumptions, what they truly measure and their limits. As a result, it is difficult to interpret results when comparing different representations. In this work, we survey supervised disentanglement metrics and thoroughly analyze them. We propose a new taxonomy in which all metrics fall into one of three families: interventionbased, predictor-based and information-based. We conduct extensive experiments, where we isolate representation properties to compare all metrics on many aspects. From experiment results and analysis, we provide insights on relations between disentangled representation properties. Finally, we provide guidelines on how to measure disentanglement and report the results.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Julian Zaïdi

Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

Measuring Disentanglement: A Review of Metrics

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

Measuring Disentanglement: A Review of Metrics

Contact Info

Product

Resources

About