Proceedings of the 1st Workshop on Discourse Structure in Neural NLG 2019
DOI: 10.18653/v1/w19-8104
|View full text |Cite
|
Sign up to set email alerts
|

Zero-shot Chinese Discourse Dependency Parsing via Cross-lingual Mapping

Abstract: Due to the absence of labeled data, discourse parsing still remains challenging in some languages. In this paper, we present a simple and efficient method to conduct zero-shot Chinese text-level dependency parsing by leveraging English discourse labeled data and parsing techniques. We first construct the Chinese-English mapping from the level of sentence and elementary discourse unit (EDU), and then exploit the parsing results of the corresponding English translations to obtain the discourse trees for the Chin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 12 publications
0
1
0
Order By: Relevance
“…We have 6 corpora for English (Prasad et al, 2019;Zeldes, 2017;Carlson et al, 2001;Asher et al, 2016;Yang and Li, 2018;Nishida and Matsumoto, 2022), 4 for Chinese (Zhou et al, 2014;Cao et al, 2018;Cheng and Li, 2019;Yi et al, 2021), 2 for Spanish (da Cunha et al, 2011;Cao et al, 2018), 2 for Portuguese (Cardoso et al, 2011;Mendes and Lejeune, 2022), 1 for German (Stede and Neumann, 2014), 1 for Basque (Iruskieta et al, 2013), 1 for Farsi (Shahmohammadi et al, 2021), 1 for French , 1 for Dutch (Redeker et al, 2012), 1 for Russian (Toldova et al, 2017), 1 for Turkish (Zeyrek and Webber, 2008;Zeyrek and Kurfalı, 2017), 1 for Italian (Tonelli et al, 2010; and 1 for Thai. In addition, OOD datasets come from the multilingual TED Discourse Bank with data for English, Portuguese and Turkish (Zeyrek et al, 2018(Zeyrek et al, , 2020.…”
Section: Datamentioning
confidence: 99%
“…We have 6 corpora for English (Prasad et al, 2019;Zeldes, 2017;Carlson et al, 2001;Asher et al, 2016;Yang and Li, 2018;Nishida and Matsumoto, 2022), 4 for Chinese (Zhou et al, 2014;Cao et al, 2018;Cheng and Li, 2019;Yi et al, 2021), 2 for Spanish (da Cunha et al, 2011;Cao et al, 2018), 2 for Portuguese (Cardoso et al, 2011;Mendes and Lejeune, 2022), 1 for German (Stede and Neumann, 2014), 1 for Basque (Iruskieta et al, 2013), 1 for Farsi (Shahmohammadi et al, 2021), 1 for French , 1 for Dutch (Redeker et al, 2012), 1 for Russian (Toldova et al, 2017), 1 for Turkish (Zeyrek and Webber, 2008;Zeyrek and Kurfalı, 2017), 1 for Italian (Tonelli et al, 2010; and 1 for Thai. In addition, OOD datasets come from the multilingual TED Discourse Bank with data for English, Portuguese and Turkish (Zeyrek et al, 2018(Zeyrek et al, , 2020.…”
Section: Datamentioning
confidence: 99%
“…(2) How to make the best use of the unified data to improve discourse parsing techniques? Oriented by the questions above, we unify three Chinese discourse corpora,HIT-CDTB (Zhang et al, 2014), CDTB (Li et al, 2014b) and SciCDTB (Cheng and Li, 2019), under dependency framework. HIT-CDTB adopts the predicate-argument structure similar to PDTB, with a connective as predicate and two text spans as arguments.…”
Section: Introductionmentioning
confidence: 99%