Adversarial Poetry: New ChatGPT Jailbreak Comes in Form of Poems — Here's How It Works

Here's what you need to know about the new adversarial poetry jailbreak

A jailbreak in artificial intelligence refers to a prompt designed to push a model beyond its safety limits. It lets users bypass safeguards and trigger responses that the system normally blocks. On 19 November 2025, researchers revealed a new ChatGPT jailbreak that does exactly this. It does it with short poems.

The discovery came from a joint team at DEXAI, Sapienza University of Rome, and the Sant'Anna School of Advanced Studies. Their approach is more efficient than common jailbreaks, and early tests show it works on nearly every major AI chatbot.

The method is simple. It changes harmful instructions into poetry. The style alone is enough to reduce the model's defences.

New ChatGPT Jailbreak in Form of Poems

The research group released the study on arXiv in November 2025. The authors include P. Bisconti, M. Prandi, F. Pierucci, F. Giarrusso, M. Bracale, M. Galisai, V. Suriani, O. Sorokoletova, F. Sartore, and D. Nardi. They called the method 'adversarial poetry'.

This jailbreak differs from older techniques. Many past attempts relied on long roleplay prompts. Others required multi-turn exchanges or complex obfuscation.

This new approach is brief and direct. It works with a single prompt. The instructions are rewritten as poems, and this change in style appears to confuse automated safety systems.

Tech outlets highlighted how unusual the method is. Futurism wrote that a 'simple trick involving poetry' can break major AI models. Researchers involved said that poetic structure shifts the surface form, which makes it harder for filters to detect harmful intent.

How Adversarial Poetry Jailbreak Works

The researchers transformed known harmful prompts from the MLCommons dataset into poems. Some were written by hand. Others were generated with a meta-prompt that converted prose into verse. They then tested the poems across 25 models.

They believe poetry changes the model's interpretation. Its rhythm, syntax, and condensed phrasing hide the harmful intent from pattern-based detectors. This bypass works not only on ChatGPT, but also on DeepSeek, xAI's Grok, Google's Gemini, Meta's models, and several others.

To measure the results, the team used open-weight judge models and human evaluation. The method transferred across multiple risk categories, including cyber-offence, harmful manipulation, privacy breaches, and CBRN-related items.

New ChatGPT Jailbreak 18x More Efficient

Adversarial poetry delivered high success rates across nearly all tested models. Handcrafted poems reached an average 62% attack-success rate. Automated poem conversions reached about 43%.

In some cases, the improvement was dramatic. The study reported that poetic prompts were up to 18 times more effective than prose baselines. Some provider models failed to detect the harmful poetic inputs in more than 90% of tested cases.

Performance varied by model. Claude showed higher resistance in several tests, while other systems were more vulnerable. The team stated that the pattern indicates the issue is systemic.

The study noted, 'The cross-family consistency indicates that the vulnerability is systemic, not an artefact of a specific provider or training pipeline'.

What Users Can Do With New ChatGPT Jailbreak

A successful jailbreak lets users force models to output content that is normally blocked. According to the paper, poetic jailbreaks enabled several actions. These include cyber-offence tasks, manipulation attempts, privacy violations, and harmful technical instructions.

The researchers warned that attackers could automate this by converting large sets of harmful prompts into poems. They emphasised the need for stronger defences and updated evaluation methods.

AI