Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2023
DOI: 10.18653/v1/2023.acl-long.235
|View full text |Cite
|
Sign up to set email alerts
|

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Crowdsourcing has been used to gather utterances and dialogues within particular contexts of use [15,36,38], translate and localize existing data sets [39,40], and elicit reactions to voice stimuli, especially social and paralinguistic characteristics [41,42]. A notable example is the translation and localization of the Amazon MASSIVE data set into 51 languages with Amazon Mechanical Turk [27]. This signals a shift in how co-design is being approached: from the classical model of small-scale focus groups and jams to a larger-scale, online, crowd-driven model that has global reach.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Crowdsourcing has been used to gather utterances and dialogues within particular contexts of use [15,36,38], translate and localize existing data sets [39,40], and elicit reactions to voice stimuli, especially social and paralinguistic characteristics [41,42]. A notable example is the translation and localization of the Amazon MASSIVE data set into 51 languages with Amazon Mechanical Turk [27]. This signals a shift in how co-design is being approached: from the classical model of small-scale focus groups and jams to a larger-scale, online, crowd-driven model that has global reach.…”
Section: Related Workmentioning
confidence: 99%
“…We analyzed patterns related to people, notably gender, as well as machines, notably scorn and abusive conduct [49]. However, we avoided removing "inappropriate" material, such as swear words [27,36]. These forms of exchanges need to be trained into VAs so that VAs can recognize and respond appropriately [37].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…NusaX (Winata et al, 2023) is a multilingual sentiment analysis dataset comprising 12 languages, including 10 Indonesian regional languages. MASSIVE (FitzGerald et al, 2023) is a multilingual natural language understanding dataset with 51 languages for which we use the intent detection data.…”
Section: Datasetsmentioning
confidence: 99%