2024
DOI: 10.3390/app14093558
|View full text |Cite
|
Sign up to set email alerts
|

All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks

Kazuhiro Takemoto

Abstract: Large Language Models (LLMs), such as ChatGPT, encounter `jailbreak’ challenges, wherein safeguards are circumvented to generate ethically harmful prompts. This study introduces a straightforward black-box method for efficiently crafting jailbreak prompts that bypass LLM defenses. Our technique iteratively transforms harmful prompts into benign expressions directly utilizing the target LLM, predicated on the hypothesis that LLMs can autonomously generate expressions that evade safeguards. Through experiments c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 8 publications
(11 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?