ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Luong, Minh-Thang; Le, Quoc V.; Manning, Christopher D.

doi:10.48550/arxiv.2003.10555

Cited by 396 publications

(585 citation statements)

References 0 publications

Supporting

Mentioning

477

Contrasting

Order By: Relevance

“…Future work on acronym extraction can explore model adaptability to other domains and can attempt to capture acronym-long-form interactions better during their extrac-tion. We use the base version of mBERT for all our experiments; larger and specialized models such as RoBERTa (Liu et al 2019), ELECTRA (Clark et al 2020), LegalBERT (Chalkidis et al 2020), etc. can also be tested.…”

Section: Discussionmentioning

confidence: 99%

CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction

Kannen¹,

Sheth²,

Chandra³

et al. 2021

Preprint

View full text Add to dashboard Cite

Acronyms and long-forms are commonly found in research documents, more so in documents from scientific and legal domains. Many acronyms used in such documents are domain-specific, and are very rarely found in normal text corpora. Owing to this, transformer-based NLP models often detect OOV (Out of Vocabulary) for acronym tokens, especially for non-English languages, and their performance suffers while linking acronyms to their long forms during extraction. Moreover, pre-trained transformer models like BERT are not specialized to handle scientific and legal documents. With these points being the overarching motivation behind this work, we propose a novel framework CABACE: Character-Aware BERT for ACronym Extraction, which takes into account character sequences in text, and is adapted to scientific and legal domains by masked language modelling. We further use an objective with an augmented loss function, adding max loss and mask loss terms to the standard cross-entropy loss for training CABACE. We further leverage pseudo labelling and adversarial data generation to improve the generalizability of the framework. Experimental results prove the superiority of the proposed framework in comparison to various baselines. Additionally, we show that the proposed framework is better suited than baseline models for zero-shot generalization to non-English languages, thus reinforcing the effectiveness of our approach. Our team BacKGProp secured the highest scores on the French dataset, second-highest on Danish and Vietnamese, and third-highest in English-Legal dataset on the global leaderboard for the acronym extraction (AE) shared task at SDU AAAI-22.

show abstract

Section: Discussionmentioning

confidence: 99%

CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction

Kannen¹,

Sheth²,

Chandra³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Specifically, we replace 25% of the tokens in the anchor with random tokens sampled from the vocabulary or a special token such as [MASK]. Such token replacements have been hugely successful in masked language modeling [11,38] and forces the model to distinguish the tokens based on the context, thereby avoiding overfitting. The irrelevant document ψ − i is computed through hard negative mining [39] i.e.…”

Section: Document Pairs Constructionmentioning

confidence: 99%

Structure and Semantics Preserving Document Representations

Raman¹,

Shah²,

Veloso³

2022

Preprint

View full text Add to dashboard Cite

Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally unrelated to be far apart in the representation space. Furthermore, the separation margins between the documents are varied flexibly to encode the heterogeneity in relationship strengths. The model is fully fine-tunable and natively supports query projection during inference. We demonstrate that it outperforms competing methods on multiple datasets for document retrieval tasks.

show abstract

“…CodeBERT also employed Masked Language Modeling (MLM) [18] and Replaced Token Detection (RTD) [36] during pre-training, allowing to take tokens from random positions and masking them with special tokens, which are later used to predict the original tokens. As a result, each token is assigned a vector representation containing information about the token and its position in a given code.…”

Section: Codebertmentioning

confidence: 99%

Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests

Fatima,

Ghaleb,

Briand

2021

Preprint

View full text Add to dashboard Cite

Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky tests introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. Besides rerunning test cases multiple times, which is time-consuming and computationally expensive, flaky tests can be predicted using machine learning (ML) models. However, the state-of-the-art ML-based flaky test predictors rely on pre-defined sets of features that are either project-specific, i.e., inapplicable to other projects, or require access to production code, which is not always available to software test engineers. Moreover, given the non-deterministic behavior of flaky tests, it can be challenging to determine a complete set of features that could potentially be associated with test flakiness. Therefore, in this paper, we propose Flakify, a black-box, language model-based predictor for flaky tests. Flakify does not require to (a) rerun test cases, (b) pre-define features, or (c) access to production code. To this end, we employ CodeBERT, a pre-trained language model, and fine-tune it to predict flaky tests by relying exclusively on the source code of test cases. We evaluated Flakify on a publicly available dataset and compared our results with FlakeFlagger, the best state-of-the-art ML-based, white-box predictor for flaky tests. Flakify surpasses FlakeFlagger by 10 and 18 percentage points (pp) in terms of precision and recall, respectively, thus reducing the effort bound to be wasted on unnecessarily debugging test cases and production code by the same percentages (corresponding to a reduction rate of 25% and 64%), respectively. Our results further show that a black-box version of FlakeFlagger is not a viable option for predicting flaky tests.

show abstract

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Cited by 396 publications

References 0 publications

CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction

CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction

Structure and Semantics Preserving Document Representations

Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests

Contact Info

Product

Resources

About