Transformer Based Grapheme-to-Phoneme Conversion

Yolchuyeva, Sevinj; Németh, Géza; Gyires-Tóth, Bálint

doi:10.21437/interspeech.2019-1954

Cited by 44 publications

(33 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Novak et al (2016) employ a joint multigram approach to generate weighted finite-state transducers for G2P. Recently, neural sequence-to-sequence models based on CNN and RNN architectures have been proposed for the G2P task delivering superior results compared to earlier non-neural approaches (Chae et al, 2018;Yolchuyeva et al, 2019a). Similar to our approach, Yolchuyeva et al (2019b) use transformers (Vaswani et al, 2017) to perform English G2P conversion.…”

Section: Related Workmentioning

confidence: 97%

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

Vesik¹,

Abdul-Mageed²,

Silfverberg³

2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.

show abstract

Section: Related Workmentioning

confidence: 97%

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

Vesik¹,

Abdul-Mageed²,

Silfverberg³

2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

show abstract

“…Recurrent neural networks in a variety of models have been applied to the g2p problem, including LSTMs and bidirectional LSTMs (Rao et al, 2015), as well as convolutional networks (Yolchuyeva et al, 2019). The Transformer for g2p is investigated in and Yolchuyeva et al (2020), showing improvements over previous models, at least in high-resource settings. Low-resource settings for g2p in general are examined in Jyothi and Hasegawa-Johnson (2017), and a number of papers have experimented with high-resource to lowresource transfer learning (Schlippe et al, 2014; Deri and Knight, 2016), an avenue we did not explore in this work.…”

Section: Related Workmentioning

confidence: 99%

Data Augmentation for Transformer-based G2P

Ryan¹,

Hulden²

2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks. It is unclear, however, if the Transformer would benefit as much as other seq2seq models from data augmentation strategies in the low-resource setting. In this paper we explore methods for data augmentation in the g2p task together with the Transformer model. Our results show that a relatively simple alignment-based approach of identifying consistent input-output subsequences in grapheme-phoneme data combined with a subsequent splicing together of such pieces to generate hallucinated data works well in the low-resource setting, often delivering substantial performance improvement over a standard Transformer model.

show abstract

“…Many different approaches to G2P exist in the literature, including rule-based systems , LSTMs (Rao et al, 2015), jointsequence models (Galescu and Allen, 2002), and encoder-decoder architectures, based on convolutional neural networks (Yolchuyeva et al, 2019), LSTMs (Yao and Zweig, 2015), or transformers (Yolchuyeva et al, 2020;Sun et al, 2019). In this paper, we improve over previous work by exploring two straightforward extensions of a standard transformer (Vaswani et al, 2017) model for the task: multi-task training (Caruana, 1997) and ensembling.…”

Section: Related Workmentioning

confidence: 99%

“…We explore using a transformer model (Vaswani et al, 2017) for this problem, since it has shown great promise in several areas of natural language processing (NLP), outperforming the previous state of the art on a large variety of tasks, including machine translation (Vaswani et al, 2017), summarization (Raffel et al, 2019), question-answering (Raffel et al, 2019, and sentiment-analysis (Munikar et al, 2019). While previous work has used transformers for G2P, experiments have only been performed on English, specifically on the CMUDict (Weide, 2005) and NetTalk 1 datasets (Yolchuyeva et al, 2020;Sun et al, 2019). Our approach builds upon the standard architecture by adding two straightforward modifications: multi-task training (Caruana, 1997) and ensembling.…”

Section: Introductionmentioning

confidence: 99%

Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion

Prabhu¹,

Kann²

2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

In this paper, we describe two CU Boulder submissions to SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P). Inspired by the high performance of a standard transformer model (Vaswani et al., 2017) on the task, we improve over this approach by adding two modifications: (i) Instead of training exclusively on G2P, we additionally create examples for the opposite direction, phoneme-to-grapheme conversion (P2G). We then perform multi-task training on both tasks. (ii) We produce ensembles of our models via majority voting. Our approaches, though being conceptually simple, result in systems that place 6th and 8th amongst 23 submitted systems, and obtain the best results out of all systems on Lithuanian and Modern Greek, respectively.

show abstract

Transformer Based Grapheme-to-Phoneme Conversion

Cited by 44 publications

References 17 publications

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

Data Augmentation for Transformer-based G2P

Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion

Contact Info

Product

Resources

About