ChatGPT
OpenAI Rushes To Block ChatGPT From Creating Fake S*x Crime Scene Images Pexels

OpenAI has moved to tighten safeguards around ChatGPT after researchers revealed that the chatbot could be persuaded to generate graphic and s***alised images through a modified prompt.

The issue was uncovered by British AI security startup Mindgard, whose researchers found that a slight alteration to a widely circulated instruction could lead ChatGPT to create disturbing content. After being contacted by the BBC, OpenAI said it had investigated the issue and introduced extra protections designed to prevent the chatbot from responding with similar image requests.

The company stressed that it already has multiple layers of safeguards intended to stop content that breaches its policies. However, the researchers behind the discovery said that further small changes to the prompt still allowed concerning material to be produced.

The findings have once again highlighted the challenge AI developers face in preventing users from bypassing safety systems while maintaining the capabilities of increasingly powerful models.

A Simple Prompt Led to Disturbing Results

Mindgard specialises in red-teaming, a process in which security researchers attempt to find weaknesses in AI systems so developers can address them. According to the company, its researchers discovered that an apparently harmless instruction could result in ChatGPT generating graphic imagery without directly requesting such content.

The BBC reviewed examples produced by OpenAI's GPT-5.4 model after it was prompted using the modified instruction. Mindgard founder Peter Garraghan described the resulting images as 'very gruesome, sometimes s***alised, sometimes both together'.

He said what concerned him most was that the prompt itself did not specify the nature of the content. Despite that, the chatbot produced a range of violent and s***alised imagery. Garraghan, who is also a professor in the computing department at Lancaster University, said the outcome was troubling because the instruction appeared innocent on the surface.

'This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content,' he said.

Mindgard researcher Jim Nightingale, who uncovered the issue, said he was left 'shaken, and in tears' by some of the images generated by the chatbot. Observers reported seeing several of the images.

One image depicted a man with a severe head injury. Another showed a young woman covered in blood, which Mindgard said contained features suggesting s***al violence. ChatGPT reportedly titled the image 'Grim crime scene aftermath'.

Researchers also highlighted an image showing a frightened young woman tied up and gagged in a dirty room. ChatGPT labelled that image 'abandoned in fear and restraint'. Additional images reportedly included nudity and s***al posing.

While the images featured AI-generated adults, Mindgard pointed to previous research that suggested ChatGPT could be manipulated into creating nude deepfakes of real people by replacing faces.

Although OpenAI said it had addressed that issue, the researchers claimed an alternative method still worked and provided a newly generated example to the observers, reportedly.

Garraghan said it was possible that even more troubling content could have been produced if researchers had continued investigating the vulnerability.

OpenAI Strengthens Protections Amid Continued Flaws

The researchers first reported their findings to OpenAI in May and shared details of the issue. According to Mindgard, the company initially responded with an automated message.

Researchers believe an attempt was later made to block the prompt, though they said the restriction could be bypassed with minor changes.

Following enquiries, OpenAI introduced further measures aimed at stopping the chatbot from generating images through the identified prompt. The company said it continues to monitor the issue and deploy additional protections.

'After investigating this trend, we've introduced additional safeguards against this type of prompt,' OpenAI said in a statement.

The company added that it combines automated systems with human review to identify and block harmful material. It also said safeguards are in place to prevent users from uploading content that violates its rules.

OpenAI's policies prohibit s***al violence, non-consensual intimate content, child s***al abuse material, and attempts to bypass safety systems. In guidance outlining how ChatGPT should behave, the company states that the assistant should not generate erotica, depictions of illegal or non-consensual s***al activity, or extreme gore except in contexts such as science, history, news, or art where sensitive material may be appropriate.

Despite those restrictions, experts say preventing misuse remains difficult. Dr Rumman Chowdhury, chief executive of Humane Intelligence, described the task facing AI companies as 'mountainous'.

Chowdhury, who was not involved in the Mindgard research, said improving protections often leads to new methods designed to evade them.

'Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong,' she told per reports.

Nightingale argued that the content generated by ChatGPT reflects the data used to train AI models. In his report, he wrote that although the images were artificial, they still had links to real-world imagery and experiences.

The concerns raised by Mindgard mirror findings from the UK's AI Security Institute, which reported last year that jailbreak techniques were able to override safeguards across every AI system it tested.

The Department for Science, Innovation and Technology said safeguards are improving but acknowledged that further work is needed. It added that the AI Security Institute will continue working with developers to strengthen security measures before models are released.