“…We explore using a transformer model (Vaswani et al, 2017) for this problem, since it has shown great promise in several areas of natural language processing (NLP), outperforming the previous state of the art on a large variety of tasks, including machine translation (Vaswani et al, 2017), summarization (Raffel et al, 2019), question-answering (Raffel et al, 2019, and sentiment-analysis (Munikar et al, 2019). While previous work has used transformers for G2P, experiments have only been performed on English, specifically on the CMUDict (Weide, 2005) and NetTalk 1 datasets (Yolchuyeva et al, 2020;Sun et al, 2019). Our approach builds upon the standard architecture by adding two straightforward modifications: multi-task training (Caruana, 1997) and ensembling.…”