“…Self-Supervised Training [51,368,433,78,140,346,197,352,71] Cross-Lingual Transfer LRSpeech [390], [42,12,60,271,105] Cross-Speaker Transfer [216,125,59,39] Speech Chain/ Back Transformation SpeechChain [344,345], LRSpeech [390,285] Dataset Mining in the Wild [58,119,57] Robust Enhancing Attention Tacotron 2 [376], DCTTS [326], SMA [104] MultiSpeech [38], [309,297,431,326,264,262] Replacing Attention with Duration…”