“…Some multilingual datasets for question answering (TyDiQA; Clark et al, 2020), common sense reasoning (XCOPA;Ponti et al, 2020), abstractive summarization (Hasan et al, 2021), passage ranking (mMARCO;Bonifacio et al, 2021), crosslingual visual question answering (xGQA;Pfeiffer et al, 2021), language and vision reasoning (MaRVL; Liu et al, 2021), paraphrasing (Para-Cotta;, dialogue systems (XPersona & BiToD; Lin et al, 2021a,b), lexical normalization (MultiLexNorm;van der Goot et al, 2021), and machine translation (FLORES-101; Guzmán et al, 2019) include Indonesian but most others do not, and very few include Indonesian local lan-guages. An exception is the weakly supervised named entity recognition dataset, WikiAnn (Pan et al, 2017), which covers several Indonesian local languages, namely Acehnese, Javanese, Minangkabau, and Sundanese.…”