Do Users Write More Insecure Code with AI Assistants?

Perry, Neil; Srivastava, Megha; Kumar, Deepak; Boneh, Dan

doi:10.48550/arxiv.2211.03622

Cited by 16 publications

(11 citation statements)

References 12 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We assume the victim is using a code-suggestion model, and that they will trust the code it suggests with little vetting, so the attacker will accomplish their goal by poisoning the code-suggestion model to induce it to suggest the desired payload in the context of the victim's code. Our assumption is supported by Perry et al [37], found that study participants with access to a code-suggestion model often produced more security vulnerabilities than those without access.…”

Section: A Attacker's Goalsupporting

confidence: 61%

“…Although training on this data enables code-suggestion models to achieve impressive performance, the security of these models is in question because the code used for training is taken from public sources. Security risks of code suggestions have been confirmed by recent studies [36], [37], where GitHub Copilot and OpenAI Codex models were shown to generate dangerous code suggestions.…”

Section: Introductionmentioning

confidence: 75%

See 1 more Smart Citation

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Aghakhani¹,

Dai²,

Manoel³

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: A Attacker's Goalsupporting

confidence: 61%

Section: Introductionmentioning

confidence: 75%

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Aghakhani¹,

Dai²,

Manoel³

et al. 2023

Preprint

View full text Add to dashboard Cite

“…A recent study found that a state-of-the-art model was more likely to generate code containing a vulnerability if the query asked for code without that vulnerability [25]. Another study found that programmers with artificial intelligence assistants were more likely to believe that they wrote secure code, despite having more insecure code [4]. These findings highlight the need for further research on the interface between programmers and the capabilities of large language models, such as GPT-3.…”

Section: B Using Artificial Intelligence To Generate Codementioning

confidence: 99%

“…However, when artificial intelligence is asked to generate code, the complexity of the trade-offs may be hidden from the programmer, making it difficult to fully understand and evaluate the code that is generated, often with negative consequences. For example, a recent study found that programmers write more insecure code with artificial intelligence assistants, while they are more likely to believe that they wrote secure code [4].…”

Section: Introductionmentioning

confidence: 99%

Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions

Treude

2023

Preprint

View full text Add to dashboard Cite

Navigating the diverse solution spaces of non-trivial software engineering tasks requires a combination of technical knowledge, problem-solving skills, and creativity. With multiple possible solutions available, each with its own set of trade-offs, it is essential for programmers to evaluate the various options and select the one that best suits the specific requirements and constraints of a project. Whether it is choosing from a range of libraries, weighing the pros and cons of different architecture and design solutions, or finding unique ways to fulfill user requirements, the ability to think creatively is crucial for making informed decisions that will result in efficient and effective software. However, the interfaces of current chatbot tools for programmers, such as OpenAI's ChatGPT or GitHub Copilot, are optimized for presenting a single solution, even for complex queries. While other solutions can be requested, they are not displayed by default and are not intuitive to access. In this paper, we present our work-in-progress prototype "GPTCOMPARE", which allows programmers to visually compare multiple source code solutions generated by GPT-n models for the same programmingrelated query by highlighting their similarities and differences.

show abstract

“…As an example, a future experiment might examine how highlighting strategies impact online metrics such as acceptance rates, or the total proportion of code contributed by the AI system [58]. Likewise, recent work has suggested that people write less secure code when using such AI systems [46], so future work could examine whether highlighting strategies ameliorate this risk.…”

Section: Representativeness Of Tasks Scenarios and Participantsmentioning

confidence: 99%

Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions

Vasconcelos¹,

Bansal²,

Fourney³

et al. 2023

Preprint

View full text Add to dashboard Cite

Large-scale generative models enabled the development of AI-powered code completion tools to assist programmers in writing code. However, much like other AI-powered tools, AI-powered code completions are not always accurate, potentially introducing bugs or even security vulnerabilities into code if not properly detected and corrected by a human programmer. One technique that has been proposed and implemented to help programmers identify potential errors is to highlight uncertain tokens. However, there have been no empirical studies exploring the effectiveness of this technique-nor investigating the different and not-yet-agreed-upon notions of uncertainty in the context of generative models. We explore the question of whether conveying information about uncertainty enables programmers to more quickly and accurately produce code when collaborating with an AI-powered code completion tool, and if so, what measure of uncertainty best fits programmers' needs. Through a mixed-methods study with 30 programmers, we compare three conditions: providing the AI system's code completion alone, highlighting tokens with the lowest likelihood of being generated by the underlying generative model, and highlighting tokens with the highest predicted likelihood of being edited by a programmer. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits, and is subjectively preferred by study participants. In contrast, highlighting tokens according to their probability of being generated does not provide any benefit over the baseline with no highlighting. We further explore the design space of how to convey uncertainty in AI-powered code completion tools, and find that programmers prefer highlights that are granular, informative, interpretable, and not overwhelming. This work contributes to building an understanding of what uncertainty means for generative models and how to convey it effectively.

show abstract

Do Users Write More Insecure Code with AI Assistants?

Cited by 16 publications

References 12 publications

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions

Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions

Contact Info

Product

Resources

About