Di Jin scite author profile

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack's modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks. TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness. TextAttack is democratizing NLP: anyone can try data augmentation and adversarial training on any model or dataset, with just a few lines of code. Code and tutorials are available at https://github.com/QData/TextAttack.

show abstract

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Jin

Zhou

et al. 2019

Preprint

View full text Add to dashboard Cite

Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TEXTFOOLER, a simple but strong baseline to generate natural adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate the advantages of this framework in three ways: (1) effective-it outperforms state-of-the-art attacks in terms of success rate and perturbation rate, (2) utility-preservingit preserves semantic content and grammaticality, and remains correctly classified by humans, and (3) efficient-it generates adversarial text with computational complexity linear to the text length. 1

show abstract

What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams

et al. 2021

View full text Add to dashboard Cite

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

show abstract

Risk Score for the Prediction of Contrast-Induced Nephropathy in Elderly Patients Undergoing Percutaneous Coronary Intervention

Yang

et al. 2012

Angiology

View full text Add to dashboard Cite

We developed a risk score for contrast-induced nephropathy (CIN) in elderly patients (n = 668) before percutaneous coronary intervention (PCI). Another 277 elderly patients were studied for validation. Based on the odds ratio, risk factors were assigned a weighted integer; the sum of the integers was the risk score. Among the 668 elderly patients, 105 (15.7%) experienced CIN. There were 9 risk factors for CIN (with weighted integer): estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m(2) (4), diabetes (3), left ventricular ejection fraction <45% (3), hypotension (2), age >70 years (2), myocardial infarction (2), emergency PCI (2), anemia (2), and contrast agent volume >200 mL (2). The incidence of CIN was 3.4%, 11.9%, 36.9%, and 69.8% in the low-risk (≤4), moderate risk (5-8), high-risk (9-12), and very-high-risk groups (≥13). The model demonstrated good discriminative power in the validation population (c statistic = 0.79). This score can be used to plan preventative measures.

show abstract

A Unified Semi-Supervised Community Detection Framework Using Latent Space Graph Regularization

Yang

Cao

Jin

et al. 2015

IEEE Trans. Cybern.

164

View full text Add to dashboard Cite

Community structure is one of the most important properties of complex networks and is a foundational concept in exploring and understanding networks. In real world, topology information alone is often inadequate to accurately find community structure due to its sparsity and noises. However, potential useful prior information can be obtained from domain knowledge in many applications. Thus, how to improve the community detection performance by combining network topology with prior information becomes an interesting and challenging problem. Previous efforts on utilizing such priors are either dedicated or insufficient. In this paper, we firstly present a unified interpretation to a group of existing community detection methods. And then based on this interpretation, we propose a unified semi-supervised framework to integrate network topology with prior information for community detection. If the prior information indicates that some nodes belong to the same community, we encode it by adding a graph regularization term to penalize the latent space dissimilarity of these nodes. This framework can be applied to many widely-used matrix-based community detection methods satisfying our interpretation, such as nonnegative matrix factorization, spectral clustering, and their variants. Extensive experiments on both synthetic and real networks show that the proposed framework significantly improves the accuracy of community detection, especially on networks with unclear structures.

show abstract

Differences in Ureolytic Bacterial Composition between the Rumen Digesta and Rumen Wall Based on ureC Gene Classification

Jin

Zhao²,

Zheng

et al. 2017

Front. Microbiol.

View full text Add to dashboard Cite

Ureolytic bacteria are key organisms in the rumen producing urease enzymes to catalyze the breakdown of urea to ammonia for the synthesis of microbial protein. However, little is known about the diversity and distribution of rumen ureolytic microorganisms. The urease gene (ureC) has been the target gene of choice for analysis of the urea-degrading microorganisms in various environments. In this study, we investigated the predominant ureC genes of the ureolytic bacteria in the rumen of dairy cows using high-throughput sequencing. Six dairy cows with rumen fistulas were assigned to a two-period cross-over trial. A control group (n = 3) were fed a total mixed ration without urea and the treatment group (n = 3) were fed rations plus 180 g urea per cow per day at three separate times. Rumen bacterial samples from liquid and solid digesta and rumen wall fractions were collected for ureC gene amplification and sequencing using Miseq. The wall-adherent bacteria (WAB) had a distinct ureolytic bacterial profile compared to the solid-adherent bacteria (SAB) and liquid-associated bacteria (LAB) but more than 55% of the ureC sequences did not affiliate with any known taxonomically assigned urease genes. Diversity analysis of the ureC genes showed that the Shannon and Chao1 indices for the rumen WAB was lower than those observed for the SAB and LAB (P < 0.01). The most abundant ureC genes were affiliated with Methylococcaceae, Clostridiaceae, Paenibacillaceae, Helicobacteraceae, and Methylophilaceae families. Compared with the rumen LAB and SAB, relative abundance of the OTUs affiliated with Methylophilus and Marinobacter genera were significantly higher (P < 0.05) in the WAB. Supplementation with urea did not alter the composition of the detected ureolytic bacteria. This study has identified significant populations of ureolytic WAB representing genera that have not been recognized or studied previously in the rumen. The taxonomic classification of rumen ureC genes in the dairy cow indicates that the majority of ureolytic bacteria are yet to be identified. This survey has expanded our knowledge of ureC gene information relating to the rumen ureolytic microbial community, and provides a basis for obtaining regulatory targets of ureolytic bacteria to moderate urea hydrolysis in the rumen.

show abstract

Insights into Abundant Rumen Ureolytic Bacterial Community Using Rumen Simulation System

et al. 2016

View full text Add to dashboard Cite

Urea, a non-protein nitrogen for dairy cows, is rapidly hydrolyzed to ammonia by urease produced by ureolytic bacteria in the rumen, and the ammonia is used as nitrogen for rumen bacterial growth. However, there is limited knowledge with regard to the ureolytic bacteria community in the rumen. To explore the ruminal ureolytic bacterial community, urea, or acetohydroxamic acid (AHA, an inhibitor of urea hydrolysis) were supplemented into the rumen simulation systems. The bacterial 16S rRNA genes were sequenced by Miseq high-throughput sequencing and used to reveal the ureoltyic bacteria by comparing different treatments. The results revealed that urea supplementation significantly increased the ammonia concentration, and AHA addition inhibited urea hydrolysis. Urea supplementation significantly increased the richness of bacterial community and the proportion of ureC genes. The composition of bacterial community following urea or AHA supplementation showed no significant difference compared to the groups without supplementation. The abundance of Bacillus and unclassified Succinivibrionaceae increased significantly following urea supplementation. Pseudomonas, Haemophilus, Neisseria, Streptococcus, and Actinomyces exhibited a positive response to urea supplementation and a negative response to AHA addition. Results retrieved from the NCBI protein database and publications confirmed that the representative bacteria in these genera mentioned above had urease genes or urease activities. Therefore, the rumen ureolytic bacteria were abundant in the genera of Pseudomonas, Haemophilus, Neisseria, Streptococcus, Actinomyces, Bacillus, and unclassified Succinivibrionaceae. Insights into abundant rumen ureolytic bacteria provide the regulation targets to mitigate urea hydrolysis and increase efficiency of urea nitrogen utilization in ruminants.

show abstract

Deep Learning for Text Style Transfer: A Survey

Jin

et al. 2022

View full text Add to dashboard Cite

Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Di Jin

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams

Risk Score for the Prediction of Contrast-Induced Nephropathy in Elderly Patients Undergoing Percutaneous Coronary Intervention

A Unified Semi-Supervised Community Detection Framework Using Latent Space Graph Regularization

Differences in Ureolytic Bacterial Composition between the Rumen Digesta and Rumen Wall Based on ureC Gene Classification

Insights into Abundant Rumen Ureolytic Bacterial Community Using Rumen Simulation System

Deep Learning for Text Style Transfer: A Survey

Contact Info

Product

Resources

About