Zero-shot Chinese Discourse Dependency Parsing via Cross-lingual Mapping

Yi, Chunhui; Li, Sujian

doi:10.18653/v1/w19-8104

Cited by 3 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have 6 corpora for English (Prasad et al, 2019;Zeldes, 2017;Carlson et al, 2001;Asher et al, 2016;Yang and Li, 2018;Nishida and Matsumoto, 2022), 4 for Chinese (Zhou et al, 2014;Cao et al, 2018;Cheng and Li, 2019;Yi et al, 2021), 2 for Spanish (da Cunha et al, 2011;Cao et al, 2018), 2 for Portuguese (Cardoso et al, 2011;Mendes and Lejeune, 2022), 1 for German (Stede and Neumann, 2014), 1 for Basque (Iruskieta et al, 2013), 1 for Farsi (Shahmohammadi et al, 2021), 1 for French , 1 for Dutch (Redeker et al, 2012), 1 for Russian (Toldova et al, 2017), 1 for Turkish (Zeyrek and Webber, 2008;Zeyrek and Kurfalı, 2017), 1 for Italian (Tonelli et al, 2010; and 1 for Thai. In addition, OOD datasets come from the multilingual TED Discourse Bank with data for English, Portuguese and Turkish (Zeyrek et al, 2018(Zeyrek et al, , 2020.…”

Section: Datamentioning

confidence: 99%

DisCut and DiscReT: MELODI at DISRPT 2023

Metheniti¹,

Braud²,

Muller³

et al. 2023

Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

View full text Add to dashboard Cite

This paper presents the results obtained by the MELODI team for the three tasks proposed within the DISRPT 2023 shared task on discourse: segmentation, connective identification, and relation classification. The competition involves corpora in various languages in several underlying frameworks, and proposes two tracks depending on the presence or not of annotations of sentence boundaries and syntactic information. For these three tasks, we rely on a transformer-based architecture, and investigate several optimizations of the models, including hyper-parameter search and layer freezing. For discourse relations, we also explore the use of adapters-a lightweight solution for model fine-tuning-and introduce relation mappings to partially deal with the label set explosion we are facing within the setting of the shared task in a multi-corpus perspective. In the end, we propose one single architecture for segmentation and connectives, based on XLM-RoBERTa large, freezed at lower layers, with new stateof-the-art results for segmentation, and we propose 3 different models for relations, since the task makes it harder to generalize across all corpora.

show abstract

Section: Datamentioning

confidence: 99%

DisCut and DiscReT: MELODI at DISRPT 2023

Metheniti¹,

Braud²,

Muller³

et al. 2023

Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023)

View full text Add to dashboard Cite

show abstract

“…(2) How to make the best use of the unified data to improve discourse parsing techniques? Oriented by the questions above, we unify three Chinese discourse corpora,HIT-CDTB (Zhang et al, 2014), CDTB (Li et al, 2014b) and SciCDTB (Cheng and Li, 2019), under dependency framework. HIT-CDTB adopts the predicate-argument structure similar to PDTB, with a connective as predicate and two text spans as arguments.…”

Section: Introductionmentioning

confidence: 99%

Unifying Discourse Resources with Dependency Framework

2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

For text-level discourse analysis, there are various discourse schemes but relatively few labeled data, because discourse research is still immature and it is labor-intensive to annotate the inner logic of a text. In this paper, we attempt to unify multiple Chinese discourse corpora under different annotation schemes with discourse dependency framework by designing semi-automatic methods to convert them into dependency structures. We also implement several benchmark dependency parsers and research on how they can leverage the unified data to improve performance.

show abstract