Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1094
|View full text |Cite
|
Sign up to set email alerts
|

g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset

Abstract: Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems. One of the biggest challenges in Chinese G2P conversion is how to disambiguate the pronunciation of polyphones-characters having multiple pronunciations. Although many academic efforts have been made to address it, there has been no open dataset that can serve as a standard benchmark for fair comparison to date. In addition, most of the reported systems are hard to employ for researcher… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 5 publications
0
19
0
Order By: Relevance
“…Text Normalization Rule-based [311], Neural-based [310,223,406,430], Hybrid [432] Word Segmentation [394,444,261] POS Tagging [292,323,221,444,135] Prosody Prediction [50,405,312,186,137,322,277,62,440,210,212,3] Grapheme to Phoneme N-gram [41,24], Neural-based [403,283,33, 320] --Polyphone Disambiguation [441,392,224,295,321,29,257] and then neural networks are leveraged to model text normalization as a sequence to sequence task where the source and target sequences are non-standard words and spoken-form words respectively [310,223,430]. Recently, some works [432] propose to combine the advantages of both rule-based and neural-based models to further improve the performance of text normalization.…”
Section: Task Research Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Text Normalization Rule-based [311], Neural-based [310,223,406,430], Hybrid [432] Word Segmentation [394,444,261] POS Tagging [292,323,221,444,135] Prosody Prediction [50,405,312,186,137,322,277,62,440,210,212,3] Grapheme to Phoneme N-gram [41,24], Neural-based [403,283,33, 320] --Polyphone Disambiguation [441,392,224,295,321,29,257] and then neural networks are leveraged to model text normalization as a sequence to sequence task where the source and target sequences are non-standard words and spoken-form words respectively [310,223,430]. Recently, some works [432] propose to combine the advantages of both rule-based and neural-based models to further improve the performance of text normalization.…”
Section: Task Research Workmentioning
confidence: 99%
“…For languages like Chinese, although the lexicon can cover nearly all the characters, there are a lot of polyphones that can be only decided according to the context of a character 7 . Thus, G2P conversion in this kind of languages is mainly responsible for polyphone disambiguation, which decides the appropriate pronunciation based on the current word context [441,392,224,295,321,29,257].…”
Section: Task Research Workmentioning
confidence: 99%
“…Some studies have explored solving PD by regarding pronunciation estimation (including non-polyphonic words) as a sequenceto-sequence problem, and by applying machine translation approaches [13,14]. For Mandarin, on the other hand, some studies adopt a classification approach that estimates the correct pinyin of the polyphonic character [15,16,17]. Because polyphonic words appear only in certain parts of the sentence, we regard PD as a classification problem, similar to the approach for Mandarin.…”
Section: Polyphone Disambiguation (Pd)mentioning
confidence: 99%
“…To accurately predict the pronunciation of polyphonic characters, Cai et al [21], Shan et al [185] and Park and Lee [157] proposed to use Bi-LSTM network for G2P. On the basis of Pan et al [154], Yang et al [236] proposed to preprocess the original text by replacing the Word2Vec model with the encoder of Transformerbased NLP model and BERT pre-training model, and then carry out G2P and PSP in the Mandarin text front-end.…”
Section: Text Front-endmentioning
confidence: 99%