Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-36
|View full text |Cite
|
Sign up to set email alerts
|

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

Abstract: This paper presents an adversarial learning method for recognition-synthesis based non-parallel voice conversion. A recognizer is used to transform acoustic features into linguistic representations while a synthesizer recovers output features from the recognizer outputs together with the speaker identity. By separating the speaker characteristics from the linguistic representations, voice conversion can be achieved by replacing the speaker identity with the target one. In our proposed method, a speaker adversa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…Emovox has a recognition-synthesis structure similar to that of [56], [119]. The Seq2Seq recognition encoder consists of an encoder which is a 2-layer 256-cell BLSTM, and a decoder which is a 1-layer 512-cell LSTM with an attention layer followed by an FC layer with an output channel of 512.…”
Section: Recognition-synthesis Structurementioning
confidence: 99%
“…Emovox has a recognition-synthesis structure similar to that of [56], [119]. The Seq2Seq recognition encoder consists of an encoder which is a 2-layer 256-cell BLSTM, and a decoder which is a 1-layer 512-cell LSTM with an attention layer followed by an FC layer with an output channel of 512.…”
Section: Recognition-synthesis Structurementioning
confidence: 99%
“…Our proposed framework can be regarded as a sequencelevel recognition-synthesis structure similar to that of [102], [111]. Both the linguistic encoder and the decoder have a sequence-to-sequence encoder-decoder structure.…”
Section: Network Configurationmentioning
confidence: 99%