2022
DOI: 10.48550/arxiv.2204.08582
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(23 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Therefore, we can gauge the method's efficacy independent of any potential biases towards our own specific data. We also perform the same experiments on the Farsi-translated section of the Massive [15] corpus to gain a better understanding of model's performance on Persian. So, Firstly, we fine-tune conditional BERT with our selected set which includes 79 slot types from ATIS dataset over 10 epochs with batch-size of 8.…”
Section: Assessment Of Full-automatic Augmentation Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, we can gauge the method's efficacy independent of any potential biases towards our own specific data. We also perform the same experiments on the Farsi-translated section of the Massive [15] corpus to gain a better understanding of model's performance on Persian. So, Firstly, we fine-tune conditional BERT with our selected set which includes 79 slot types from ATIS dataset over 10 epochs with batch-size of 8.…”
Section: Assessment Of Full-automatic Augmentation Methodsmentioning
confidence: 99%
“…Consequently, the final corpus consisted of 3,000 automated dialogues and 600 semi-automated dialogues, resulting in 117 intents and 262 slots. MASSIVE dataset [15] as a part of Multilingual Amazon Slu resource package (SLURP) which was developed for Slot-filling and Intent classification, can be regarded as another source in Persian. It contains 1 million realistic, parallel, labeled virtual assistant utterances including 51 languages, 18 domains, 60 intents, and 55 slots.…”
Section: Related Workmentioning
confidence: 99%
“…We use a mixture of accents originating from non-native English speakers to resemble a real-world scenario. Voice assistants do not support the majority of the world's languages [4]. Therefore, many users have to voice their questions in a language different from their native one.…”
Section: Natural Asr Noisementioning
confidence: 99%
“…Such voice assistants do not only increase the convenience with which users can query them but can support users with visual and motor impairments for which the use of conventional text entry mechanisms (keyboard) is not applicable [11]. Despite the popularity of voice assistants among users globally and the advancements in spoken-language understanding [2,4], there are surprisingly limited efforts in studying spoken QA and its limitations.…”
Section: Introductionmentioning
confidence: 99%
“…In recognition of this, new efforts have started to be undertaken to ensure that a diversity of languages and cultural contexts are represented. For example, the Amazon Alexa team have been spearheading a "massive" crowdsourced translation and localization initiative of their MASSIVE data set into 51 languages [19], including a global competition 1 . Indeed, most of these initiatives rely on crowdsourcing activities.…”
Section: Introductionmentioning
confidence: 99%