Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention

Ezzine, Kadria; Martino, Joseph Di; Frikha, Mounir

doi:10.3390/app12147062

Cited by 4 publications

(2 citation statements)

References 33 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Two papers address esophageal speech, both using speech conversion techniques to improve its quality and intelligibility. Ezzine et al [5] used a novel sequence-to-sequence model with an auditory attention mechanism, while Raman et al [6] used synthetic speech as the target of a voice conversion system to improve the quality and intelligibility of the original voice.…”

Section: Recent Advances In Application Of Speech and Language Techno...mentioning

confidence: 99%

Special Issue on Applications of Speech and Language Technologies in Healthcare

2023

View full text Add to dashboard Cite

show abstract

Section: Recent Advances In Application Of Speech and Language Techno...mentioning

confidence: 99%

Special Issue on Applications of Speech and Language Technologies in Healthcare

2023

View full text Add to dashboard Cite

show abstract

“…Variational autoencoder (VAE) [12]- [15] is one such implementation, which learns a latent space for speakerindependent representation. Similarly, generative adversarial networks (GAN) [16] disentangle the speech attributes with an extra adversarial loss to guarantee a distribution match between the generated and true data [17], [18].…”

Section: A Cross-lingual Voice Conversionmentioning

confidence: 99%

Cross-Lingual Propaganda Detection

Zhang

2022

2022 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

This paper proposes RefXVC, a method for crosslingual voice conversion (XVC) that leverages reference information to improve conversion performance. Previous XVC works generally take an average speaker embedding to condition the speaker identity, which does not account for the changing timbre of speech that occurs with different pronunciations. To address this, our method uses both global and local speaker embeddings to capture the timbre changes during speech conversion. Additionally, we observed a connection between timbre and pronunciation in different languages and utilized this by incorporating a timbre encoder and a pronunciation matching network into our model. Furthermore, we found that the variation in tones is not adequately reflected in a sentence, and therefore, we used multiple references to better capture the range of a speaker's voice. The proposed method outperformed existing systems in terms of both speech quality and speaker similarity, highlighting the effectiveness of leveraging reference information in crosslingual voice conversion. The converted speech samples can be found on the website: http://refxvc.dn3point.com

show abstract

Analysis of Phonetic Segments of Oesophageal Speech in People Following Total Laryngectomy

2023

View full text Add to dashboard Cite

This paper presents an approach to extraction techniques for speaker recognition following total laryngectomy surgery. The aim of the research was to develop a pattern of physical features describing the oesophageal speech in people after experiencing laryngeal cancer. Research results may support the speech rehabilitation of laryngectomised patients by improving the quality of oesophageal speech. The main goal of the research was to isolate the physical features of oesophageal speech and to compare their values with the descriptors of physiological speech. Words (in Polish) used during speech rehabilitation were analyzed. Each of these words was divided into phonetic segments from which the physical features of speech were extracted. The values of the acquired speech descriptors were then used to create a vector of the physical features of oesophageal speech. A set of these features will determine a model that should allow us to recognize whether the speech-rehabilitation process is proceeding correctly and also provide a selection of bespoke procedures that we could introduce to each patient. This research is a continuation of the analysis of oesophageal speech published previously. This time, the effectiveness of parameterization was tested using methodologies for analyzing the phonetic segments of each word.

show abstract

Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention

Cited by 4 publications

References 33 publications

Special Issue on Applications of Speech and Language Technologies in Healthcare

Special Issue on Applications of Speech and Language Technologies in Healthcare

Cross-Lingual Propaganda Detection

Analysis of Phonetic Segments of Oesophageal Speech in People Following Total Laryngectomy

Contact Info

Product

Resources

About