2022
DOI: 10.3390/app12147062
|View full text |Cite
|
Sign up to set email alerts
|

Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention

Abstract: Laryngectomees are individuals whose larynx has been surgically removed, usually due to laryngeal cancer. The immediate consequence of this operation is that these individuals (laryngectomees) are unable to speak. Esophageal speech (ES) remains the preferred alternative speaking method for laryngectomees. However, compared to the laryngeal voice, ES is characterized by low intelligibility and poor quality due to chaotic fundamental frequency F0, specific noises, and low intensity. Our proposal to solve these p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 33 publications
(39 reference statements)
0
2
0
Order By: Relevance
“…Two papers address esophageal speech, both using speech conversion techniques to improve its quality and intelligibility. Ezzine et al [5] used a novel sequence-to-sequence model with an auditory attention mechanism, while Raman et al [6] used synthetic speech as the target of a voice conversion system to improve the quality and intelligibility of the original voice.…”
Section: Recent Advances In Application Of Speech and Language Techno...mentioning
confidence: 99%
“…Two papers address esophageal speech, both using speech conversion techniques to improve its quality and intelligibility. Ezzine et al [5] used a novel sequence-to-sequence model with an auditory attention mechanism, while Raman et al [6] used synthetic speech as the target of a voice conversion system to improve the quality and intelligibility of the original voice.…”
Section: Recent Advances In Application Of Speech and Language Techno...mentioning
confidence: 99%
“…Variational autoencoder (VAE) [12]- [15] is one such implementation, which learns a latent space for speakerindependent representation. Similarly, generative adversarial networks (GAN) [16] disentangle the speech attributes with an extra adversarial loss to guarantee a distribution match between the generated and true data [17], [18].…”
Section: A Cross-lingual Voice Conversionmentioning
confidence: 99%