2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) 2021
DOI: 10.1109/mlsp52302.2021.9596184
|View full text |Cite
|
Sign up to set email alerts
|

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Multi-layer perceptron (MLP) is a neural network with forward structure. It has a simple structure and strong adaptive ability, and it is widely used in the fields of natural language processing [68] and computer vision [69]. Although CNN-based and transformer-based networks are the mainstream choices in the field of computer vision, researchers still try to build the network architecture completely using MLP to explore more possibilities for visual network architecture.…”
Section: Mlp-based Architecturesmentioning
confidence: 99%
“…Multi-layer perceptron (MLP) is a neural network with forward structure. It has a simple structure and strong adaptive ability, and it is widely used in the fields of natural language processing [68] and computer vision [69]. Although CNN-based and transformer-based networks are the mainstream choices in the field of computer vision, researchers still try to build the network architecture completely using MLP to explore more possibilities for visual network architecture.…”
Section: Mlp-based Architecturesmentioning
confidence: 99%
“…To further distinguish vowels and consonants, a duration predictor is built to produce fine-grained *Corresponding author. phoneme-level duration, which is trained based on supervision calculated by force-alignment [6][7][8][9][10][11], heuristics [12][13][14][15] etc. The advantage of this type of feature processing strategy is that the input phoneme and pitch sequence are strictly aligned at the note level based on the music score.…”
Section: Introductionmentioning
confidence: 99%
“…While the convolutional layers can process variable length sequences and capture short-term correlations in speech, long-term contextual information may not easily be handled by convolutional layers compared with MLPs. In [16] and [17], MLP-based models were applied to speech or audio signals of fixed maximum length. A keyword spotting method based on a structure similar to the MLP-mixer employing the dynamic convolution [18] and the squeeze-and-excitation network (SENet) [19] is proposed in [20].…”
Section: Introductionmentioning
confidence: 99%