TED-MDB Lexicons: Tr-EnConnLex, Pt-EnConnLex

Kurfalı, Murathan; Özer, Sibel; Zeyrek, Deniz; Mendes, Amália

doi:10.18653/v1/2020.codi-1.15

Cited by 4 publications

(4 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where θ C are the parameters of the label classifier, L y is the loss obtained by the label classifier when predicting the class labels y, θ LG are the parameters of the language classifier, L lg is the loss obtained by the language classifier when predicting the language labels d, θ F are the parameters of the feature extractor, λ is the hyperparameter used to reverse the gradients, and α is the learning rate. MWE), as well as the results of the best overall system (MTLB-STRUCT) and the results of the best system on Romanian (TRAVIS-mono) (Kurfalı, 2020). All our monolingual models outperform the MTLB-STRUCT and TRAVIS-mono systems by more than 8% on unseen MWE, with RoBERT achieving an improvement of more than 20%.…”

Section: Adversarial Trainingmentioning

confidence: 82%

“…Seen/Unseen Identifying unseen expressions became the focus of PARSEME 1.2, resulting in interesting insights. Word embeddings trained on extra unannotated data (Yirmibeşoglu and Güngör, 2020) proved successful in detecting unseen expressions and not surprisingly pre-trained language models Kurfalı, 2020) were the best. While rule-based syntactic patternmatching based on association measures (Pasquer et al, 2020a) failed at capturing unseen expressions, it showed promising results in detecting various forms of a seen MWE.…”

Section: Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

2023

View full text Add to dashboard Cite

Lexical collocations: Explored a lot, still a lot more to explore Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the "LF profiles" of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the "profile" of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.

show abstract

Section: Adversarial Trainingmentioning

confidence: 82%

Section: Evaluation Metricsmentioning

confidence: 99%

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

2023

View full text Add to dashboard Cite

show abstract

“…Seen/Unseen Identifying unseen expressions became the focus of PARSEME 1.2, resulting in interesting insights. Word embeddings trained on extra unannotated data (Yirmibeşoglu and Güngör, 2020) proved successful in detecting unseen expressions and not surprisingly pre-trained language models (Taslimipoor et al, 2020;Kurfalı, 2020) were the best. While rule-based syntactic patternmatching based on association measures (Pasquer et al, 2020a) failed at capturing unseen expressions, it showed promising results in detecting various forms of a seen MWE.…”

Section: Evaluation Metricsmentioning

confidence: 99%

A Survey of MWE Identification Experiments: The Devil is in the Details

Ramisch,

Walsh,

Blanchard

et al. 2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

Multiword expression (MWE) identification has been the focus of numerous research papers, especially in the context of the DiMSUM and PARSEME Shared Tasks (STs). This survey analyses 40 MWE identification papers with experiments on data from these STs. We look at corpus selection, pre-and post-processing, MWE encoding, evaluation metrics, statistical significance, and error analyses. We find that these aspects are usually considered minor and/or omitted in the literature. However, they may considerably impact the results and the conclusions drawn from them. Therefore, we advocate for more systematic descriptions of experimental conditions to reduce the risk of misleading conclusions drawn from poorly designed experimental setup.

show abstract

“…Connective inventories have been developed for various languages, including German (Stede and Umbach, 1998), French (Roze et al, 2012), Chinese (Zhou and Xue, 2015) and English (Das et al, 2018), among others (see also . Recently, these efforts have been extended with several multi-lingual connective databases (Bourgonje et al, 2017;Kurfalı et al, 2020).…”

Section: Lexicon Creationmentioning

confidence: 99%

Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin

Marchal¹,

Scholman²,

Demberg³

2021

Proceedings of the 2nd Workshop on Computational Approaches to Discourse

View full text Add to dashboard Cite

Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Connectives and their relation senses were extracted from a parallel corpus combining automatic (PDTB end-to-end parser) and manual annotations. This resulted in Naija-Lex, a lexicon of discourse connectives in Nigerian Pidgin with English translations. The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.

show abstract

TED-MDB Lexicons: Tr-EnConnLex, Pt-EnConnLex

Cited by 4 publications

References 7 publications

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

A Survey of MWE Identification Experiments: The Devil is in the Details

Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin

Contact Info

Product

Resources

About