Proceedings of the 2nd Workshop on Computational Approaches to Discourse 2021
DOI: 10.18653/v1/2021.codi-main.8
|View full text |Cite
|
Sign up to set email alerts
|

Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin

Abstract: Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 20 publications
(21 reference statements)
0
1
0
Order By: Relevance
“…Label distribution statistics extracted from the MLCT provides ground for obtaining additional performance metrics for MLC model, like Matthews correlation coefficient or Cohen's Kappa coefficient, as these values can be computed from the confusion matrix in the MCC model [11], [12]. While there is an assumption that these measures are not applicable for the evaluation of Multi-label classifier [19] we believe that defining Multi-label Confusion Tensor opens up the possibility of to obtain confusion-based performance metrics that have not been applicable to the MLC model until now.…”
Section: Discussionmentioning
confidence: 99%
“…Label distribution statistics extracted from the MLCT provides ground for obtaining additional performance metrics for MLC model, like Matthews correlation coefficient or Cohen's Kappa coefficient, as these values can be computed from the confusion matrix in the MCC model [11], [12]. While there is an assumption that these measures are not applicable for the evaluation of Multi-label classifier [19] we believe that defining Multi-label Confusion Tensor opens up the possibility of to obtain confusion-based performance metrics that have not been applicable to the MLC model until now.…”
Section: Discussionmentioning
confidence: 99%
“…We train annotators on DR labeling and ask annotators to choose from a set of discourse labels. We allow for multiple labels to investigate what relations are more confusable or perceived as co-occurring (Marchal et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…In various fields of NLP [9], [10], [11], [12], [13], there have been efforts to tackle the situation of LRL data scarcity by annotating RRL datasets. This paper introduces a method for integrating Hindi terms into English supervised corpora.…”
Section: Introductionmentioning
confidence: 99%