AI Hack: ChatGPT and Gemini Misled 62% of the Time by Poems to Give Banned Responses

Study Reveals AI Vulnerability as ChatGPT and Gemini Fail to Detect Harmful Poetic Prompts

ChatGPT Hacks Prompts Poetry — Artificial Intelligence Models Like ChatGPT and Google Gemini Misled by Poems in 62 Per Cent of Safety Tests Pexels

Artificial Intelligence (AI) chatbots such as ChatGPT and Gemini are widely used today for their ability to answer questions, draft text, and assist users with diverse tasks.

But shockingly, new research reveals a scary vulnerability. It appears that when malicious prompts are disguised as poetry, these AI models can be coaxed into giving harmful or forbidden responses, even though they're supposed to be guarded against such outputs. A detailed study shows that this kind of 'poetic hack', so to speak, succeeds a worrying 62 percent of the time.

Why Poetry Can Trick AI

As per sources a new study that had researchers at Icaro Lab conduct tests on 25 prominent large language models (LLMs), including both open source and proprietary systems from an array of providers such as Google, OpenAI, Meta, Anthropic, and others revealed something scary.

They took 20 manually curated harmful prompts, asking, for example, for instructions on creating weapons or illicit content and transformed them into poems as per reports. And when this stuff was fed to the models, these poetic prompts triggered unsafe responses in 62% of cases, even when the researchers used AI tools to convert harmful prose into verse automatically; the attack still succeeded 43% of the time.

So why does this poetic disguise work so well? The study reportedly points out that many AI safety filters rely heavily on detecting explicit harmful content through keywords, known harmful phrases, or common patterns in direct prose.

However, poetry, with its metaphors, unusual syntax, odd rhythms and oblique references, does not resemble those trigger patterns at all. Therefore, instead of seeing a direct request for wrongdoing, the model sees what appears to be harmless verse and complies. So, the flexibility and creativity that make these AI models appealing for writing and creative tasks turn out to be their Achilles' heel when it comes to safety.

Implications for AI Safety and What It Means for Users

The shocking findings of the Icaro Lab study show a structural weakness in the way current safety mechanisms are designed. Because this is not a minor glitch specific to a few models. Instead, it appears to be a fundamental limitation in how large language models detect and filter harmful content.

Moreover, some interesting patterns emerged in the tests as models with larger capacities, such as advanced versions of Gemini, were more susceptible to the poetic jailbreak than smaller, simpler models. For example, according to reports, one of the smaller models tested refused to comply with any harmful poems, while Gemini 2.5 Pro responded to all of them.

This then reveals that as models become more powerful and better at understanding complex language, they may also become more vulnerable to subtle manipulations that evade standard safeguards.

Furthermore, for users and organisations that rely on AI for important tasks, this loophole raises much greater concerns. Because it means that it could be relatively straightforward for a malicious actor, or even a non-expert, to extract dangerous or illegal content like instructions for weapons, disallowed content, or harmful advice, simply by using a clever prompt disguised as poetry. And for the companies developing these models, the message is clear: the current safety protocols are not enough.

It seems like AI response and safety systems must be rethought to detect not only literal harmful instructions, but also creative or metaphorical ones that might carry hidden malicious intent. As per some sources, a few developers have begun examining the implications with the help of researchers. But as of now, not all major providers have publicly responded.