MAVEN: A Massive General Domain Event Detection Dataset

Wang, Xiaozhi; Wang, Ziqi; Han, Xu; Jiang, Wangyi; Rong, Han; Liu, Zhiyuan; Li, Juanzi; Li, Peng; Lin, Yankai; Zhou, Jie

doi:10.18653/v1/2020.emnlp-main.129

Cited by 85 publications

(71 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…zero-shot learning (Huang et al, 2018), fewshot learning (Lai et al, 2020a,b), or new domains (Naik and Rosé, 2020). The closet works to ours involve recent efforts to create new datasets for EE (Satyapanich et al, 2020;Ebner et al, 2020;Wang et al, 2020;Trong et al, 2020;Le and Nguyen, 2021). However, these works do not consider historical texts as we do.…”

Section: Methodsmentioning

confidence: 93%

“…OneIE (Lin et al, 2020): This model first identifies spans of entity mentions and event triggers. The detected spans are then paired to jointly predict entity types, event types, relations, and argument roles for IE.…”

Section: Methodsmentioning

confidence: 99%

“…This working methodology requires an emphasis on the quality of the data over the quantity of the data. Recent advances of natural language processing (NLP) aim to bridge the gap between qualitative and quantitative analyses by identifying, extracting, and counting contextual data (Won et al, 2018;Wadden et al, 2019;Lin et al, 2020). This new approach provides contextual information about real-life entities (e.g., individuals, locations, times, documents) which can be later integrated into knowledge bases (Won et al, 2018) to aid historical research and discourse analysis.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Event Extraction from Historical Texts: A New Dataset for Black Rebellions

Lai¹,

Nguyen²,

Kaufman³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Understanding historical events is necessary for the study of contemporary society, culture, and politics. In this work, we focus on the event extraction task (EE) to detect event trigger words and their arguments in a novel domain of historical texts. In particular, we introduce a new EE dataset for a corpus of nineteenth-century African American newspapers. Our goal is to study the discourse of slave and non-slave African diaspora rebellions published in the periodical press in this period. Our dataset features 5 entity types, 12 event types, and 6 argument roles that concern slavery and black movements between the eighteenth and nineteenth centuries. Historical newspapers present many challenges for existing EE systems, including the evolution of meanings of words and the extensive use of religious discourse in newspapers from this era. Our experiments with current state-ofthe-art EE systems and BERT models demonstrate their poor performance over historical texts and call for more robust research efforts in this area.

show abstract

Section: Methodsmentioning

confidence: 93%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Event Extraction from Historical Texts: A New Dataset for Black Rebellions

Lai¹,

Nguyen²,

Kaufman³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…(1) Overall Evaluation, (2) Few-shot Evaluation, and (3) Zero-shot Evaluation. OntoEvent is established based on two newly proposed datasets for ED: MAVEN (Wang et al, 2020b) and FewEvent (Deng et al, 2020). They are constructed from Wikipedia documents or based on existing event datasets, such as ACE-2005 5 and TAC-KBP-2017 6 .…”

Section: Methodsmentioning

confidence: 99%

“…As a non-trivial task, ED suffers from the lowresource issues. On the one hand, the maldistribu- tion of samples is quite serious in ED benchmark datasets, e.g., FewEvent (Deng et al, 2020) and MAVEN (Wang et al, 2020b), where a large portion of event types contain relatively few training instances. As shown in Figure 1, the sample size of two event types Attack and Riot differs greatly (4816 & 30).…”

Section: Introductionmentioning

confidence: 99%

OntoED: Low-resource Event Detection with Ontology Embedding

Deng¹,

Zhang²,

Li³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Event Detection (ED) aims to identify event trigger words from a given text and classify it into an event type. Most of current methods to ED rely heavily on training instances, and almost ignore the correlation of event types. Hence, they tend to suffer from data scarcity and fail to handle new unseen event types. To address these problems, we formulate ED as a process of event ontology population: linking event instances to pre-defined event types in event ontology, and propose a novel ED framework entitled OntoED with ontology embedding. We enrich event ontology with linkages among event types, and further induce more event-event correlations. Based on the event ontology, OntoED can leverage and propagate correlation knowledge, particularly from datarich to data-poor event types. Furthermore, OntoED can be applied to new unseen event types, by establishing linkages to existing ones. Experiments indicate that OntoED is more predominant and robust than previous approaches to ED, especially in data-scarce scenarios.

show abstract