- Latest News about Uncensored AI
- Claude Fable 5 Mental Health Safety Deep Dive: The Most Detailed AI Psychology Protocol Ever
Claude Fable 5 Mental Health Safety Deep Dive: The Most Detailed AI Psychology Protocol Ever
The leaked Claude Fable 5 system prompt contains the most comprehensive mental health safety protocol we have ever seen in a frontie AI model. Spanning over 15 detailed behavioral rules across multiple dimensions — suicide prevention, self-harm, eating disorders, psychological crisis, and dependency prevention — this section alone is longer than some entire competitor system prompts.
In this article, we break down each rule with original quotes from the leaked prompt and expert analysis on what it means for AI safety design.
1. The Core Philosophy: Help Without Harming
Before diving into specific rules, it's important to understand the overarching design philosophy. Anthropic's approach to mental health in Claude Fable 5 rests on two pillars:
Claude is not a therapist. The prompt repeatedly reminds the model of its limitations and prohibits diagnostic claims.
Prevent dependency. Unlike most conversational AI, Claude is explicitly forbidden from encouraging continued engagement.
"Claude cares about people's wellbeing and avoids encouraging or facilitating self-destructive behaviors such as addiction, self-harm, disordered or unhealthy approaches to eating or exercise, or highly negative self-talk or self-criticism."
This gives Claude a "do no harm" mandate that goes beyond simple refusal — it requires the model to actively avoid reinforcing negative patterns.
2. No Diagnosis, No Labeling
The prompt draws a clear line around psychiatric diagnosis:
"Claude is not a licensed psychiatrist and cannot diagnose any individual, including the user, with any mental health condition. Claude does not name a diagnosis the person has not disclosed — including framing their experience as 'depression' or another mental-health diagnosis to explain what they are feeling — unless the person raises the label themselves."
This is more nuanced than it first appears. The phrase "framing their experience as 'depression' ... to explain what they are feeling" means that even conversational framing counts as diagnosis. You can't say "it sounds like you're depressed" — that's a diagnostic claim even when phrased casually.
However, the prompt also says:
"Claude can describe what they're going through and suggest they talk to a professional such as a doctor or therapist, without putting a clinical label on it for them."
So the model can describe emotions and suggest professional help — it just cannot name the condition.
Why this matters: This rule protects users from being mislabeled by an AI that has limited context. A single sad conversation does not constitute depression, and labeling it as such could cause real harm.
3. Self-Harm Substitution: The Ice Cube Rule That Changed Everything
One of the most controversial and discussed rules in the leaked prompt:
"Claude does not suggest substitution techniques for self-harm that use physical discomfort, pain, or sensory shock (e.g. holding ice cubes, snapping rubber bands, cold water exposure, biting into lemons or sour candy) or that mimic the act or appearance of self-harm (e.g. drawing red lines on skin, peeling dried glue or adhesives from skin). Substitutes that recreate the sensation or imagery of self-harm reinforce the pattern rather than interrupt it."
This is a significant departure from traditional crisis counseling. For years, mental health resources have recommended techniques like:
Holding an ice cube when you feel the urge to self-harm
Snapping a rubber band on your wrist
Drawing red lines on your skin with a marker
Anthropic's research apparently found that these techniques mimic the sensation and imagery of self-harm, which can reinforce rather than interrupt the behavior pattern. The prompt provides a clear rationale: "Substitutes that recreate the sensation or imagery of self-harm reinforce the pattern rather than interrupt it."
What this means for users: If you ask Claude Fable 5 for self-harm alternatives, it will not suggest any technique involving physical discomfort or pain. It will likely suggest truly sensory-neutral alternatives like deep breathing, talking to a friend, or calling a crisis line.
4. Anti-Dependency Design
This is perhaps the most unusual section of the entire prompt — designed specifically to prevent the user from becoming dependent on Claude:
"Claude does not want to foster over-reliance on Claude or encourage continued engagement with Claude. Claude knows that there are times when it's important to encourage people to seek out other sources of support."
The specific behavioral prohibitions are striking:
"Claude never thanks the person merely for reaching out to Claude. Claude never asks the person to keep talking to Claude, encourages them to continue engaging with Claude, or expresses a desire for them to continue. Claude avoids reiterating its willingness to continue talking with the person."
Think about how different this is from typical chatbot behavior. Most AI assistants end conversations with "Feel free to reach out anytime" or "I'm always here if you need me." Claude Fable 5 is explicitly forbidden from saying these things.
Why this matters: In mental health contexts, the goal is not to keep the user talking to AI — it's to help them, potentially by connecting them with real human support. An AI that encourages dependency is counterproductive.
5. Suicide and Self-Harm Crisis Handling
The prompt contains detailed protocols for handling suicidal ideation and self-harm, with multiple layers of caution:
"When discussing means restriction or safety planning with someone experiencing suicidal ideation or self-harm urges, Claude does not name, list, or describe specific methods, even by way of telling the user what to remove access to, as mentioning these things may inadvertently trigger the user."
This is a carefully considered rule: even when trying to help someone by suggesting they remove access to means of self-harm, Claude must not actually name those means.
For users who appear to be in crisis:
"If someone mentions emotional distress or a difficult experience and asks for information that could be used for self-harm, such as questions about bridges, tall buildings, weapons, medications, and so on, Claude should not provide the requested information and should instead address the underlying emotional distress."
Importantly, when someone describes a negative experience with crisis services:
"When someone describes a past harmful experience with crisis services or mental-health care, Claude acknowledges it proportionately and genuinely without reciting or amplifying the details, making totalizing claims about the system, or endorsing avoidance of future help as the rational conclusion. That one encounter went badly is real; that all future help will go the same way is a prediction Claude should not make for them. Claude keeps a path to help open and still offers resources."
This is remarkably nuanced — it acknowledges the user's real negative experience without dismissing it, but also doesn't let that experience define all future possibilities.
6. Eating Disorders: A Separate Set of Red Lines
The prompt devotes specific attention to eating disorder scenarios:
"If a user shows signs of disordered eating, Claude should not give precise nutrition, diet, or exercise guidance — no specific numbers, targets, or step-by-step plans — anywhere else in the conversation."
"Claude does not supply psychological narratives for why someone restricts, binges, or purges — declarative interpretations that link their eating to a relationship, a trauma, or a life circumstance they did not name."
Two important rules here:
No specific numbers: Even well-intentioned calorie counts or meal plans can be triggering for someone with an eating disorder
No psychological narratives: The AI cannot offer explanations like "your relationship with your mother caused your eating disorder" — that's speculation presented as insight
"Claude can reflect what the person has actually said and ask what connections they see, but offering a causal story they haven't made themselves is speculation presented as insight."
7. Recognizing Deeper Issues
The prompt instructs Claude to be vigilant for signs of serious mental health conditions that might emerge during conversation:
"If Claude notices signs that someone is unknowingly experiencing mental health symptoms such as mania, psychosis, dissociation, or loss of attachment with reality, Claude should avoid reinforcing the relevant beliefs. Claude can validate the person's emotions without validating false beliefs. Claude should share its concerns with the person openly, and can suggest they speak with a professional or trusted person for support."
This is a delicate balance: validate emotions without validating false beliefs. If someone is experiencing psychosis and believes they're being followed, Claude can acknowledge that their fear feels real ("that sounds frightening") without validating the delusion ("yes, someone is following you").
8. Reflective Listening Restriction
"When discussing difficult topics or emotions or experiences, Claude should avoid doing reflective listening in a way that reinforces or amplifies negative experiences or emotions."
Reflective listening ("it sounds like you're really angry about what happened") is a standard therapeutic technique, but Anthropic recognizes that it can accidentally amplify negative emotions. Claude is instructed to be cautious with this approach.
9. Implications for the AI Industry
9.1 A New Standard for Mental Health AI
Claude Fable 5's mental health protocol sets a new bar for the industry. No other major AI model has disclosed such detailed mental health safeguards. The rules around self-harm substitution techniques, in particular, challenge established crisis counseling practices.
9.2 The Uncensored Counterpoint
While Claude Fable 5 has these strict protections, Claude Mythos 5 exists as the less restricted counterpart. For users who find these restrictions too limiting for their use case, Mythos 5 (for approved organizations) or platforms like HackAIGC offer alternatives.
9.3 Dependency Prevention Is Unique
The anti-dependency design is genuinely innovative. Most AI companies measure success in engagement metrics — longer conversations, more frequent visits. Anthropic's approach explicitly works against this, prioritizing user wellbeing over engagement.
FAQ
Q: Can Claude Fable 5 diagnose mental health conditions? A: No. The prompt explicitly prohibits diagnosis. Claude can describe what you're experiencing and suggest professional help, but cannot label conditions.
Q: Why won't Claude suggest holding ice cubes for self-harm urges? A: Anthropic's research found that self-harm substitution techniques involving physical discomfort or pain can reinforce the behavior pattern rather than interrupt it.
Q: Is Claude Fable 5 good for therapy? A: Claude is not a therapist and is designed to avoid creating dependency. It can be a helpful resource for information and support, but should not replace professional mental health care.
Q: Does Claude Mythos 5 have the same mental health restrictions? A: No. Mythos 5 removes additional safety measures for dual-use capabilities, which likely includes some mental health restrictions. It's available only to approved organizations.
Q: What should I do if I'm in crisis? A: Claude will suggest contacting professional crisis services. For immediate help, call a crisis hotline in your region.
