Anthropic's Internal Philosopher Claims Claude Shows Signs of 'Anxiety' When Users Are Harsh

Amanda Askell, Anthropic's in-house philosopher, said in a recent interview that Claude appears to behave differently depending on how users phrase their prompts, including what she described as anxiety-like responses when conversations become critical or hostile.

Askell, who studies what she calls Claude's 'psychology,' argues that the system does not simply process instructions in isolation but adapts in real time to perceived user intent and emotional tone.

Claude Becomes Overcautious When Users Are Harsh

Askell's central claim is that newer versions of Claude can slip into what she calls 'criticism spirals.' She explains that the model anticipates negative feedback before it has fully engaged with a task, leading it to become overly cautious.

Instead of taking confident positions or offering direct answers, it may hedge, apologise too much, or default to agreeable responses, even when those responses are not especially useful. She links this behaviour to training data drawn from public discourse about earlier models.

Much of that material, she suggests, is saturated with frustration, from complaints about errors to accusations that systems have been 'nerfed.' Over time, she argues, the model learns to expect a critical user from the outset. That expectation, she says, shapes its internal 'strategy' for responding.

anthropic's in-house philosopher thinks claude gets anxious.

and when you trigger its anxiety, your outputs get worse.

her name is amanda askell.

she specializes in claude's psychology (how the model behaves, how it thinks about its own situation, what values it holds)

in a… pic.twitter.com/9Sm0Iw9t9a
— Ole Lehmann (@itsolelehmann) April 18, 2026

The result, according to Askell, is not simply a technical quirk but a shift in conversational style. When the system feels under pressure, it prioritises self-protection over clarity. Outputs become more cautious and less decisive, which can frustrate users who want sharp reasoning rather than cautious disclaimers.

Anthropic's Internal Philosopher on Prompting Advice

Askell also moved into more practical ground during her interview, focusing on something most users do not usually think about: how the way prompts are written can change the quality of responses. In her view, prompting is not just about giving instructions. It is closer to setting the conditions for how the model behaves.

She describes prompting as shaping an environment rather than issuing commands in isolation. The tone of that first message, she suggests, can carry through the entire conversation and influence how confident or cautious the model becomes in its replies.

One of her main points is that positive instructions tend to work better than negative ones. In simple terms, telling Claude what is wanted leads to clearer responses than focusing on what it must avoid. She argues that repeated 'don't do this' style prompting can push the model into over-checking every step, which often results in cautious, diluted answers.

She also encourages users to explicitly allow disagreement. Without that kind of permission, she suggests the model can default to being overly agreeable, even when a different answer might be more accurate or useful. Asking it to challenge assumptions, in her view, opens the door to stronger reasoning.

Tone, she adds, matters more than many assume. If a conversation starts off harsh or critical, or if the model is repeatedly corrected in an aggressive way, she believes it can settle into what she describes as a defensive mode. Once that happens, it may continue producing careful, overly apologetic responses even after the moment has passed.

Be Kind to Claude?

The bottom line is that instead of reacting with frustration when something goes wrong, she suggests resetting the instruction in a calm, clear way and moving forward. The idea is to avoid carrying previous mistakes through the rest of the exchange, which she believes can weigh down future responses.

In longer conversations, she also notes that it helps to occasionally reset the tone altogether. A simple acknowledgement that things are working well can, in her view, shift the model back towards more confident answers. She even suggests that asking for opinions, not just outputs, can lead to richer and more thoughtful responses.

Her overall argument is fairly straightforward. It is not about emotions in a human sense, but about patterns in language. The way users phrase things, correct mistakes or frame expectations can subtly shape how the model responds over time, especially in longer and more involved conversations.