Yasheng Wang scite author profile

Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversaryspecified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in natural language processing (NLP) are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort. In this paper, we propose to use the syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance (almost 100% success rate) to the insertionbased methods but possesses much higher invisibility and stronger resistance to defenses. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks. All the code and data of this paper can be obtained at https://github.com/ thunlp/HiddenKiller.

show abstract

Nanozyme and aptamer- based immunosorbent assay for aflatoxin B1

Zhou

Wang

et al. 2020

Journal of Hazardous Materials

View full text Add to dashboard Cite

Multi-Channel Reverse Dictionary Model

Zhang

Liu

et al. 2020

AAAI

View full text Add to dashboard Cite

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Existing reverse dictionary methods cannot deal with highly variable input queries and low-frequency target words successfully. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https://github.com/thunlp/MultiRD.

show abstract

Improved PLS regression based on SVM classification for rapid analysis of coal properties by near-infrared reflectance spectroscopy

Wang

Yang

Gao³

et al. 2014

Sensors and Actuators B: Chemical

View full text Add to dashboard Cite

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Wei¹,

Ren²,

Li³

et al. 2019

Preprint

View full text Add to dashboard Cite

The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT [1] with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including named entity recognition (People's Daily NER), sentence matching (LCQMC), Chinese sentiment classification (ChnSenti) and natural language inference (XNLI).

show abstract

Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes

Qin

Ouyang

et al. 2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning

Si¹,

Zhang²,

Qi³

et al. 2021

View full text Add to dashboard Cite

Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted to cover more search space of adversarial attacks by adding textual adversarial examples during training. However, the number of adversarial examples for text augmentation is still extremely insufficient due to the exponentially large attack search space. In this work, we propose a simple and effective method to cover a much larger proportion of the attack search space, called Adversarial and Mixup Data Augmentation (AMDA). Specifically, AMDA linearly interpolates the representations of pairs of training samples to form new virtual samples, which are more abundant and diverse than the discrete text adversarial examples in conventional ADA. Moreover, to fairly evaluate the robustness of different models, we adopt a challenging evaluation setup, which generates a new set of adversarial examples targeting each model. In text classification experiments of BERT and RoBERTa, AMDA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance degradation of ADA on the clean data. Our code is available at: https://github.com/thunlp/MixADA.

show abstract

Enzyme-induced Cu2+/Cu+ conversion as the electrochemical signal for sensitive detection of ethyl carbamate

Wang

Zhou

et al. 2021

Analytica Chimica Acta

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yasheng Wang

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Nanozyme and aptamer- based immunosorbent assay for aflatoxin B1

Multi-Channel Reverse Dictionary Model

Improved PLS regression based on SVM classification for rapid analysis of coal properties by near-infrared reflectance spectroscopy

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes

Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning

Enzyme-induced Cu2+/Cu+ conversion as the electrochemical signal for sensitive detection of ethyl carbamate

Contact Info

Product

Resources

About