“…Text Normalization Rule-based [311], Neural-based [310,223,406,430], Hybrid [432] Word Segmentation [394,444,261] POS Tagging [292,323,221,444,135] Prosody Prediction [50,405,312,186,137,322,277,62,440,210,212,3] Grapheme to Phoneme N-gram [41,24], Neural-based [403,283,33, 320] --Polyphone Disambiguation [441,392,224,295,321,29,257] and then neural networks are leveraged to model text normalization as a sequence to sequence task where the source and target sequences are non-standard words and spoken-form words respectively [310,223,430]. Recently, some works [432] propose to combine the advantages of both rule-based and neural-based models to further improve the performance of text normalization.…”