Proceedings of the First Workshop on Computational Approaches to Discourse 2020
DOI: 10.18653/v1/2020.codi-1.15
|View full text |Cite
|
Sign up to set email alerts
|

TED-MDB Lexicons: Tr-EnConnLex, Pt-EnConnLex

Abstract: In this work, we present two new bilingual discourse connective lexicons, namely, for Turkish-English and European Portuguese-English created automatically using the existing discourse relation-aligned TED-MDB corpus. In their current form, the Pt-En lexicon includes 95 entries, whereas the Tr-En lexicon contains 133 entries. The lexicons constitute the first step of a larger project of developing a multilingual discourse connective lexicon.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…where θ C are the parameters of the label classifier, L y is the loss obtained by the label classifier when predicting the class labels y, θ LG are the parameters of the language classifier, L lg is the loss obtained by the language classifier when predicting the language labels d, θ F are the parameters of the feature extractor, λ is the hyperparameter used to reverse the gradients, and α is the learning rate. MWE), as well as the results of the best overall system (MTLB-STRUCT) and the results of the best system on Romanian (TRAVIS-mono) (Kurfalı, 2020). All our monolingual models outperform the MTLB-STRUCT and TRAVIS-mono systems by more than 8% on unseen MWE, with RoBERT achieving an improvement of more than 20%.…”
Section: Adversarial Trainingmentioning
confidence: 82%
See 1 more Smart Citation
“…where θ C are the parameters of the label classifier, L y is the loss obtained by the label classifier when predicting the class labels y, θ LG are the parameters of the language classifier, L lg is the loss obtained by the language classifier when predicting the language labels d, θ F are the parameters of the feature extractor, λ is the hyperparameter used to reverse the gradients, and α is the learning rate. MWE), as well as the results of the best overall system (MTLB-STRUCT) and the results of the best system on Romanian (TRAVIS-mono) (Kurfalı, 2020). All our monolingual models outperform the MTLB-STRUCT and TRAVIS-mono systems by more than 8% on unseen MWE, with RoBERT achieving an improvement of more than 20%.…”
Section: Adversarial Trainingmentioning
confidence: 82%
“…Seen/Unseen Identifying unseen expressions became the focus of PARSEME 1.2, resulting in interesting insights. Word embeddings trained on extra unannotated data (Yirmibeşoglu and Güngör, 2020) proved successful in detecting unseen expressions and not surprisingly pre-trained language models Kurfalı, 2020) were the best. While rule-based syntactic patternmatching based on association measures (Pasquer et al, 2020a) failed at capturing unseen expressions, it showed promising results in detecting various forms of a seen MWE.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Seen/Unseen Identifying unseen expressions became the focus of PARSEME 1.2, resulting in interesting insights. Word embeddings trained on extra unannotated data (Yirmibeşoglu and Güngör, 2020) proved successful in detecting unseen expressions and not surprisingly pre-trained language models (Taslimipoor et al, 2020;Kurfalı, 2020) were the best. While rule-based syntactic patternmatching based on association measures (Pasquer et al, 2020a) failed at capturing unseen expressions, it showed promising results in detecting various forms of a seen MWE.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Connective inventories have been developed for various languages, including German (Stede and Umbach, 1998), French (Roze et al, 2012), Chinese (Zhou and Xue, 2015) and English (Das et al, 2018), among others (see also . Recently, these efforts have been extended with several multi-lingual connective databases (Bourgonje et al, 2017;Kurfalı et al, 2020).…”
Section: Lexicon Creationmentioning
confidence: 99%