Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation

Pi, Xinyu; Wang, Bing; Gao, Yan; Guo, Jiaqi; Li, Zhoujun; Lou, Jian–Guang

doi:10.18653/v1/2022.acl-long.142

Cited by 8 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, with LLM-based parsers, researchers focuse on eliciting reasoning and self-correction capabilities in LLMs by designing better prompts. However, although some work has explored the adversarial robustness of NLIDB (Gan et al, 2021;Pi et al, 2022), few studies have pointed out the potential security risks emerging from malicious user interaction.…”

Section: Related Workmentioning

confidence: 99%

TrojanSQL: SQL Injection against Natural Language Interface to Database

Zhang,

Zhou,

Hui

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

The technology of text-to-SQL has significantly enhanced the efficiency of accessing and manipulating databases. However, limited research has been conducted to study its vulnerabilities emerging from malicious user interaction. By proposing TrojanSQL, a backdoor-based SQL injection framework for text-to-SQL systems, we show how state-of-the-art text-to-SQL parsers can be easily misled to produce harmful SQL statements that can invalidate user queries or compromise sensitive information about the database. The study explores two specific injection attacks, namely boolean-based injection and union-based injection, which use different types of triggers to achieve distinct goals in compromising the parser. Experimental results demonstrate that both medium-sized models based on fine-tuning and LLM-based parsers using prompting techniques are vulnerable to this type of attack, with attack success rates as high as 99% and 89%, respectively. We hope that this study will raise more concerns about the potential security risks of building natural language interfaces to databases.

show abstract

Section: Related Workmentioning

confidence: 99%

TrojanSQL: SQL Injection against Natural Language Interface to Database

Zhang,

Zhou,

Hui

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Nevertheless, our approach performs strongly for error detection as it can still effectively capture semantic errors that are free from schema linking mistakes. This can be explained by the high column mention rate in Spider (Pi et al, 2022). Future work could develop more effective entity linking mechanisms to extend our model to more challenging testing environments where schema linking errors are more common.…”

Section: Limitationsmentioning

confidence: 99%

Error Detection for Text-to-SQL Semantic Parsing

Chen,

Sun

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Despite remarkable progress in text-to-SQL semantic parsing in recent years, the performance of existing parsers is still far from perfect. Specifically, modern text-to-SQL parsers based on deep learning are often over-confident, thus casting doubt on their trustworthiness when deployed for real use. In this paper, we propose a parser-independent error detection model for text-to-SQL semantic parsing. Using a language model of code as its bedrock, we enhance our error detection model with graph neural networks that learn structural features of both natural language questions and SQL queries. We train our model on realistic parsing errors collected from a cross-domain setting, which leads to stronger generalization ability. Experiments with three strong text-to-SQL parsers featuring different decoding mechanisms show that our approach outperforms parser-dependent uncertainty metrics. Our model could also effectively improve the performance and usability of text-to-SQL semantic parsers regardless of their architectures 1 .

show abstract

“…Recently, the reliability of Text-to-SQL algorithms, and code generation systems more generally, has attracted increasing attention. A number of researchers (e.g., Zeng et al [28], Deng et al [29], and Pi et al [30]) reported that perturbing the input questions or table columns may impact the performance of Text-to-SQL algorithms significantly, but none of them has explored whether the model input could threaten the connected database. Nguyen and Nadi [31] and Vasconcelos et al [32] noticed that code generated by GitHub Copilot (which is based on Codex) often contains errors, where Perce et al [33] further observed web security vulnerabilities.…”

Section: (Un)reliability Of Code Generationmentioning

confidence: 99%

On the Vulnerabilities of Text-to-SQL Models

Peng,

Zhang,

Yang

et al. 2023

2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)

View full text Add to dashboard Cite

Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. 1 This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.

show abstract

Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation

Cited by 8 publications

References 37 publications

TrojanSQL: SQL Injection against Natural Language Interface to Database

TrojanSQL: SQL Injection against Natural Language Interface to Database

Error Detection for Text-to-SQL Semantic Parsing

On the Vulnerabilities of Text-to-SQL Models

Contact Info

Product

Resources

About