The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021
DOI: 10.1109/asru51503.2021.9687908
|View full text |Cite
|
Sign up to set email alerts
|

Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…To address the challenges of EL speech enhancement, both frame-to-frame and seq-to-seq [48] mapping paradigms can be applied. Seq-to-seq VC models, utilizing an attentionbased encoder-decoder architecture [33], can perform repre-sentation learning and alignment simultaneously [30], capturing long-term dependencies such as prosody and speaker identity [32]. Some research has demonstrated the potential of using TTS pretraining in conjunction with seq-to-seq modeling for EL speech enhancement [30], [31].…”
Section: B Vc-based Statistical F 0 Prediction and Voicing State Controlmentioning
confidence: 99%
See 2 more Smart Citations
“…To address the challenges of EL speech enhancement, both frame-to-frame and seq-to-seq [48] mapping paradigms can be applied. Seq-to-seq VC models, utilizing an attentionbased encoder-decoder architecture [33], can perform repre-sentation learning and alignment simultaneously [30], capturing long-term dependencies such as prosody and speaker identity [32]. Some research has demonstrated the potential of using TTS pretraining in conjunction with seq-to-seq modeling for EL speech enhancement [30], [31].…”
Section: B Vc-based Statistical F 0 Prediction and Voicing State Controlmentioning
confidence: 99%
“…Seq-to-seq VC models, utilizing an attentionbased encoder-decoder architecture [33], can perform repre-sentation learning and alignment simultaneously [30], capturing long-term dependencies such as prosody and speaker identity [32]. Some research has demonstrated the potential of using TTS pretraining in conjunction with seq-to-seq modeling for EL speech enhancement [30], [31]. However, most seq-to-seq models require a substantial amount of highquality parallel training data.…”
Section: B Vc-based Statistical F 0 Prediction and Voicing State Controlmentioning
confidence: 99%
See 1 more Smart Citation