Dice Loss for Data-imbalanced NLP Tasks

Li, Xiaoya; Sun, Xiaofei; Meng, Yuxian; Liang, Junjun; Wu, Fei; Li, Jiwei

doi:10.48550/arxiv.1911.02855

Cited by 39 publications

(46 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the samples do not belong to any specific class (non-concept or non-named-entity). As showed in [15], this imbalance can rise to a ratio of 168:1. One option to handle this imbalance is to over or under-sample the data [33], or selecting features based on their importance for the minority class [31].…”

Section: Related Workmentioning

confidence: 89%

“…One option to handle this imbalance is to over or under-sample the data [33], or selecting features based on their importance for the minority class [31]. Other approaches to overcoming the negative consequences of these extreme imbalanced classes are to use different loss functions [15] or to adopt different weights for the distinct classes [22].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Nested and Balanced Entity Recognition using Multi-Task Learning

Waldis,

Mazzola

2021

Preprint

View full text Add to dashboard Cite

Entity Recognition (ER) within a text is a fundamental exercise in Natural Language Processing, enabling further depending tasks such as Knowledge Extraction, Text Summarisation, or Keyphrase Extraction. An entity consists of single words or of a consecutive sequence of terms, constituting the basic building blocks for communication. Mainstream ER approaches are mainly limited to flat structures, concentrating on the outermost entities while ignoring the inner ones. This paper introduces a partly-layered network architecture that deals with the complexity of overlapping and nested cases. The proposed architecture consists of two parts: (1) a shared Sequence Layer and (2) a stacked component with multiple Tagging Layers. The adoption of such an architecture has the advantage of preventing overfit to a specific word-length, thus maintaining performance for longer entities despite their lower frequency. To verify the proposed architecture's effectiveness, we train and evaluate this architecture to recognise two kinds of entities -Concepts (CR) and Named Entities (NER). Our approach achieves state-of-the-art NER performances, while it outperforms previous CR approaches. Considering these promising results, we see the possibility to evolve the architecture for other cases such as the extraction of events or the detection of argumentative components.

show abstract

Section: Related Workmentioning

confidence: 89%

Section: Related Workmentioning

confidence: 99%

Nested and Balanced Entity Recognition using Multi-Task Learning

Waldis,

Mazzola

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, binary cross-entropy (BCE) loss, which is also referred to as log loss, often serves as the default loss metric for binary classification tasks [134][135][136]. However, it often performs poorly when data are imbalanced [134,137,138]. As a result, it may be more appropriate to apply class weighting, or weighted binary cross-entropy (WBCE), when classes are imbalanced [134,135].…”

Section: Note On Loss Metricsmentioning

confidence: 99%

“…As a result, it may be more appropriate to apply class weighting, or weighted binary cross-entropy (WBCE), when classes are imbalanced [134,135]. Alternatively, 1-F1 Score or 1-Dice, generally referred to as Dice loss, is more robust to data imbalance than BCE [137][138][139][140]. It is even possible to combine multiple loss functions, such as BCE and Dice loss with equal or different weighting applied to each loss component [134].…”

Section: Note On Loss Metricsmentioning

confidence: 99%

Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review

2021

View full text Add to dashboard Cite

Convolutional neural network (CNN)-based deep learning (DL) is a powerful, recently developed image classification approach. With origins in the computer vision and image processing communities, the accuracy assessment methods developed for CNN-based DL use a wide range of metrics that may be unfamiliar to the remote sensing (RS) community. To explore the differences between traditional RS and DL RS methods, we surveyed a random selection of 100 papers from the RS DL literature. The results show that RS DL studies have largely abandoned traditional RS accuracy assessment terminology, though some of the accuracy measures typically used in DL papers, most notably precision and recall, have direct equivalents in traditional RS terminology. Some of the DL accuracy terms have multiple names, or are equivalent to another measure. In our sample, DL studies only rarely reported a complete confusion matrix, and when they did so, it was even more rare that the confusion matrix estimated population properties. On the other hand, some DL studies are increasingly paying attention to the role of class prevalence in designing accuracy assessment approaches. DL studies that evaluate the decision boundary threshold over a range of values tend to use the precision-recall (P-R) curve, the associated area under the curve (AUC) measures of average precision (AP) and mean average precision (mAP), rather than the traditional receiver operating characteristic (ROC) curve and its AUC. DL studies are also notable for testing the generalization of their models on entirely new datasets, including data from new areas, new acquisition times, or even new sensors.

show abstract

“…We introduce two main modifications to the original PUnet model. First, we replace the reconstruction loss by a Dice loss (Li et al, 2019), more suited to our unbalanced segmentation problem (see Section 3). We weight the Dice loss so that a wrong prediction on a blend as more importance than a wrong prediction on the background, which dominates the image.…”

Section: Modelmentioning

confidence: 99%

Probabilistic segmentation of overlapping galaxies for large cosmological surveys

Hubert¹,

Boucaud²,

Huertas-Company³

2021

Preprint

View full text Add to dashboard Cite

Encoder-Decoder networks such as U-Nets have been applied successfully in a wide range of computer vision tasks, especially for image segmentation of different flavours across different fields. Nevertheless, most applications lack of a satisfying quantification of the uncertainty of the prediction. Yet, a well calibrated segmentation uncertainty can be a key element for scientific applications such as precision cosmology. In this on-going work, we explore the use of the probabilistic version of the U-Net, recently proposed by Kohl et al. (2018), and adapt it to automate the segmentation of galaxies for large photometric surveys. We focus especially on the probabilistic segmentation of overlapping galaxies, also known as blending. We show that, even when training with a single ground truth per input sample, the model manages to properly capture a pixel-wise uncertainty on the segmentation map. Such uncertainty can then be propagated further down the analysis of the galaxy properties. To our knowledge, this is the first time such an experiment is applied for galaxy deblending in astrophysics.

show abstract

Dice Loss for Data-imbalanced NLP Tasks

Cited by 39 publications

References 33 publications

Nested and Balanced Entity Recognition using Multi-Task Learning

Nested and Balanced Entity Recognition using Multi-Task Learning

Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review

Probabilistic segmentation of overlapping galaxies for large cosmological surveys

Contact Info

Product

Resources

About