2020
DOI: 10.48550/arxiv.2008.03029
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Peking Opera Synthesis via Duration Informed Attention Network

Abstract: Peking Opera has been the most dominant form of Chinese performing art since around 200 years ago. A Peking Opera singer usually exhibits a very strong personal style via introducing improvisation and expressiveness on stage which leads the actual rhythm and pitch contour to deviate significantly from the original music score. This inconsistency poses a great challenge in Peking Opera singing voice synthesis from a music score. In this work, we propose to deal with this issue and synthesize expressive Peking O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…The model can generate high-fidelity and diverse songs with a coherence of several minutes. Yusong et al [233] even explored the composition of Peking Opera singing voice based on the Duration Informed Attention Network (DurIAN) framework.…”
Section: Singing Voice Synthesis(svs)mentioning
confidence: 99%
“…The model can generate high-fidelity and diverse songs with a coherence of several minutes. Yusong et al [233] even explored the composition of Peking Opera singing voice based on the Duration Informed Attention Network (DurIAN) framework.…”
Section: Singing Voice Synthesis(svs)mentioning
confidence: 99%
“…As sequence-to-sequence (Seq2Seq) models have become the predominant architectures in neural-based TTS, state-of-the-art SVS systems have also adopted the encoder-decoder methods and showed improved performance over simple network structure (e.g., DNN, CNN, RNN) [17][18][19][20][21][22][23]. In these methods, the encoders and decoders vary from bi-directional Long-Short-Term Memory units (LSTM) to multi-head self-attention (MHSA) based blocks.…”
Section: Introductionmentioning
confidence: 99%
“…The parameters include equalisation, compression, spatialization, accuracy, and precision. The existing used techniques include Convolutional Neural Network based Long Short Term Memory CNN-LSTM [18], Bidirectional based Long Short Term Memory (Bi-LSTM) [19], and Optical Music Recognition (OMR) [20].…”
mentioning
confidence: 99%