Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control

Sini, Aghilas; Maguer, Sébastien Le; Lolive, Damien; Delais-Roussarie, Élisabeth

doi:10.21437/speechprosody.2020-191

Cited by 5 publications

(3 citation statements)

References 11 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the data generated from the various models listed representing the fake audio datasets, we added audio snippets from various recordings and datasets like synplaflex which is a corpus of audiobooks in French composed of 87 hours of good quality speech [14] and other recorded audio messages mainly in French language thus representing the authentic audio snippets dataset. Then, from this dataset, we segmented the audios into 10 seconds snippets and 02 seconds snippets.…”

Section: A Datamentioning

confidence: 99%

Evaluating Acoustic Parameters for DeepFake Audio Identification

Djiré,

Sabané,

Kabore

et al. 2023

2023 IEEE Afro-Mediterranean Conference on Artificial Intelligence (AMCAI)

View full text Add to dashboard Cite

The progress made in the field of machine learning applied to signal processing offers interesting perspectives in terms of technological evolution but also causes some troubles in terms of ethics and security. For example, we are witnessing the emergence of audio deepFakes used to orchestrate scams. However, although the tools used in the generation of these deepFake audios show good results which can sometimes produce audios that seem to be confused with real audio, it is not impossible to dissect them. In order to detect them, many methods exist, in particular the analysis of the acoustic parameters which can attest to the authenticity of an audio extract. These parameters include energy, power, pitch, signal spectrum, cepstral coefficients, etc. However, these acoustic parameters are numerous and not all of them are suitable for detecting deepFake audio. This paper presents a comparative review of acoustic parameters useful in detecting DeepFake audio. Among them, we highlight the relevance of the study of cepstral parameters such as MFCC compared to other acoustic parameters such as mel-spectograms. The objective is to provide reliable leads in the detection of deepFake audio.

show abstract

Section: A Datamentioning

confidence: 99%

Evaluating Acoustic Parameters for DeepFake Audio Identification

Djiré,

Sabané,

Kabore

et al. 2023

2023 IEEE Afro-Mediterranean Conference on Artificial Intelligence (AMCAI)

View full text Add to dashboard Cite

show abstract

“…NEB has read numerous books whose recordings are available on Librivox. In the Syn-PaFlex project, more than 87 hours of this voice were extracted and annotated according to various expressive aspects in order to build a corpus dedicated to French expressive TTS [9]. Indeed, the speaker is able to change her prosody and modify her voice in order to personify some characters with a distinct style from the indirect speech [10].…”

Section: Datamentioning

confidence: 99%

LIUM-TTS entry for Blizzard 2023

Saget,

Gaudier,

Shamsi

et al. 2023

18th Blizzard Challenge Workshop

View full text Add to dashboard Cite

This paper presents the LIUM-TTS entry for Blizzard 2023. It is the first participation of the LIUM in the Blizzard Challenge. The Blizzard Challenge 2023 focused on French language in two tasks. The Hub task was provided with 50 audio hours (with partially aligned annotation), and the Spoke task only 2 hours. The proposed TTS for the Hub task consists of a Transformer-based grapheme-to-phoneme, a FastSpeech 2based acoustic model, and a fine-tuned Waveglow vocoder. The output of this system has been fed through a voice conversion module from Hub to Spoke voice. The perceptual evaluation of our system in comparison with other Blizzard participants shows its weaknesses and highlights future working axes to deal with upcoming challenges.

show abstract

“…LibriSpeech was also borrowed in TTS-related task to control emotion of generated speech [212,305]. LibriVox is a collection of public audiobooks that can be used in controllable deep audio synthesis [213,306]. Emotional Speech Database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers and covers 5 emotion categories (neutral, happy, angry, sad and surprise) [307].…”

Section: Audiomentioning

confidence: 99%

Controllable Data Generation by Deep Learning: A Review

Wang¹,

Du²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.

show abstract

Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control

Cited by 5 publications

References 11 publications

Evaluating Acoustic Parameters for DeepFake Audio Identification

Evaluating Acoustic Parameters for DeepFake Audio Identification

LIUM-TTS entry for Blizzard 2023

Controllable Data Generation by Deep Learning: A Review

Contact Info

Product

Resources

About