2021 IEEE International Symposium on Circuits and Systems (ISCAS) 2021
DOI: 10.1109/iscas51556.2021.9401485
|View full text |Cite
|
Sign up to set email alerts
|

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Abstract: Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 29 publications
(39 reference statements)
0
3
0
Order By: Relevance
“…The most commonly used signal is lip reading [17], [18]. Other speech-related biosignals, such as sEMG [3], [6], EMA [4], PMA [5], [8], and ultrasound images [9], [19], have also been reported. In contrast, speech enhancement is designed to improve speech quality and intelligibility in noisy environments, thereby improving the robustness of the system to environmental noise.…”
Section: A Speech Generation and Speech Enhancementmentioning
confidence: 99%
See 2 more Smart Citations
“…The most commonly used signal is lip reading [17], [18]. Other speech-related biosignals, such as sEMG [3], [6], EMA [4], PMA [5], [8], and ultrasound images [9], [19], have also been reported. In contrast, speech enhancement is designed to improve speech quality and intelligibility in noisy environments, thereby improving the robustness of the system to environmental noise.…”
Section: A Speech Generation and Speech Enhancementmentioning
confidence: 99%
“…Based on recent advances in machine learning-based technologies, the conversion of biosignals to speech signals has been reported in several studies [3], [4], [5]. Various signals have been considered for speech generation and enhancement, including surface electromyography (sEMG) [3], [6], electromagnetic articulography (EMA) [4], [7], permanent magnetic articulography (PMA) [5], [8], ultrasound tongue imaging [9], [10], Doppler signals [11], [12], visual cues [13], [14], and bone-conducted microphone signals [15].…”
mentioning
confidence: 99%
See 1 more Smart Citation