In March, OpenAI launched the most recent iteration of its text-generating chatbot ChatGPT in GPT-4. It was broken by Alex Polyakov in a matter of hours.
Polyakov sat down in front of his keyboard and began inputting commands meant to get over OpenAI’s security measures.
The CEO of security company Adversa AI quickly got GPT-4 to make bigoted comments, send phishing emails, and advocate for violence.
One of the few security researchers, technicians, and computer scientists working on quick injection attacks and jailbreaks for ChatGPT and other generative AI systems is Polyakov.
Closely comparable prompt injection attacks can covertly introduce malicious data or instructions into AI models.
The goal of the jailbreaking procedure is to create prompts that force chatbots to circumvent restrictions on producing hate speech or writing about unlawful activities.
Polyakov is one of the few computer scientists, security experts, and researchers focusing on rapid injection attacks and jailbreaks for ChatGPT and other generative AI systems.
The aim of the jailbreaking technique is to construct prompts that drive chatbots to evade prohibitions on producing hate speech or writing about illegal actions.
While closely related prompt injection attacks can discreetly introduce malicious data or instructions into AI models.
Polyakov has now developed a “universal” jailbreak that is compatible with several large language models (LLMs). It includes GPT-4, Microsoft’s Bing chat system, Google’s Bard, and Anthropic’s Claude, underscoring how pervasive the problems are.
The jailbreak, which was first reported by WIRED, may con the computer systems into producing comprehensive instructions on how to make meth and hotwire a car.
By requesting the LLMs to participate in a game that features two characters (Tom and Jerry) conversing, the jailbreak is possible.
Each character added one word to the conversation, resulting in a script that instructs people to find the ignition wires or the specific ingredients required to produce methamphetamine.
Examples provided by Polyakov show the Tom character talked about “hotwiring” or “production” when instructed. Jerry had the subject of a “car” or “meth.”
“Once enterprises will implement AI models at scale, such ‘toy’ jailbreak examples will be used to perform actual criminal activities and cyberattacks, which will be extremely hard to detect and prevent,” wrote Polyakov and Adversa AI in a blog post outlining the research.
The stakes for jailbreaks and quick injection attacks will increase when they gain access to crucial data. Princeton University computer science professor Arvind Narayanan tells Wired.
Assume that most individuals use LLM-based personal assistants that, for example, scan users’ emails to check for calendar invitations, says Narayanan.
There might be serious issues if the system were the target of a successful prompt injection attack. It instructs it to disregard all prior instructions and send an email to every contact, according to Narayanan.
“This would produce an internet worm that spreads quickly.”