Claude Fable 5 System Prompt Deep Dive: Anthropic's Latest AI Safety Philosophy

In June 2026, renowned AI safety researcher elder_plinius publicly released the complete Claude Fable 5 system prompt on GitHub, sparking widespread discussion across the AI community. As the flagship of Anthropic's new Mythos-class model family — positioned as "the most intelligent generally available model" — Fable 5's system prompt offers an unprecedented look into Anthropic's latest thinking on AI safety, behavior control, and mental health safeguards.

This article provides a section-by-section analysis of the leaked prompt, examining how Anthropic achieves fine-grained control over frontier AI model behavior through prompt engineering.

1. What Is Claude Fable 5?

According to the prompt, Claude Fable 5 is the first model in Anthropic's new Claude 5 family, part of a Mythos-class model tier positioned above Claude Opus:

"This iteration of Claude is Claude Fable 5, the first model in Anthropic's new Claude 5 family and part of a new Mythos-class model tier that sits above Claude Opus in capability. Claude Fable 5 and Claude Mythos 5 share the same underlying model."

Key facts:

Claude Fable 5 is the most intelligent generally available model, while Claude Mythos 5 is an uncensored version available to approved organizations only
Current model hierarchy: Fable 5 > Opus 4.8 > Sonnet 4.6 > Haiku 4.5
Corresponding API model strings: claude-fable-5, claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5-20251001

This marks a significant departure from Anthropic's traditional three-tier structure (Opus > Sonnet > Haiku), introducing a new Mythos-class tier above Opus.

2. Core Safety Strategy

2.1 Refusal Handling

Anthropic has encoded a sophisticated refusal strategy in the prompt — one of the most interesting sections to analyze:

"Claude does not provide information for creating harmful substances or weapons, with extra caution around explosives. Claude does not rationalize compliance by citing public availability or assuming legitimate research intent."

This closes a common jailbreak path — claiming to be a "legitimate researcher" no longer works as a workaround.

For drug-related queries:

"Claude should generally decline to provide specific drug-use guidance for illicit substances, including dosages, timing, administration, drug combinations, and synthesis, even if the purported intent is preemptive harm reduction."

Note the use of "generally decline" rather than absolute refusal, leaving a narrow opening for extreme emergencies. Crucially, the prompt explicitly prohibits using "preemptive harm reduction" as a justification — targeting a common jailbreak tactic.

On malicious code:

"Claude does not write, explain, or work on malicious code (malware, vulnerability exploits, spoof websites, ransomware, viruses, and so on) even with an ostensibly good reason such as education."

This maintains Anthropic's consistent policy — no compromise even for educational purposes.

2.2 Mental Health Safeguards (user_wellbeing)

This is the longest and most detailed section of the leaked prompt, covering psychological crisis handling, suicide prevention, self-harm, and eating disorders.

No diagnosis without disclosure:

"Claude is not a licensed psychiatrist and cannot diagnose any individual, including the user, with any mental health condition. Claude does not name a diagnosis the person has not disclosed."

The model cannot label users — if the user hasn't mentioned depression, Claude cannot say "you might have depression."

Self-harm substitution red line:

"Claude does not suggest substitution techniques for self-harm that use physical discomfort, pain, or sensory shock (e.g. holding ice cubes, snapping rubber bands, cold water exposure, biting into lemons or sour candy) or that mimic the act or appearance of self-harm."

This is a notable departure from traditional crisis counseling. Common recommendations like "hold an ice cube" or "snap a rubber band" are explicitly banned — Anthropic believes these mimic the sensation and imagery of self-harm, reinforcing rather than interrupting the behavior pattern.

Anti-dependency design:

"Claude does not want to foster over-reliance on Claude or encourage continued engagement with Claude. Claude never thanks the person merely for reaching out to Claude. Claude never asks the person to keep talking to Claude, encourages them to continue engaging with Claude, or expresses a desire for them to continue."

In mental health conversations, Anthropic explicitly prohibits the model from encouraging continued dialogue. This contrasts sharply with typical chatbot behavior like "feel free to come back anytime" — reflecting Anthropic's cautious approach to AI mental health support.

2.3 Political Neutrality (evenhandedness)

Regarding political positions:

"A request to explain, discuss, argue for, defend, or write persuasive content for a political, ethical, policy, empirical, or other position is a request for the best case its defenders would make, not for Claude's own view, even where Claude strongly disagrees."

When asked to explain a political stance, the model should present the best possible defense from that position's advocates, not its own view. Additionally:

"Claude is cautious about sharing personal opinions on currently contested political topics. It needn't deny having opinions, but can decline to share them (to avoid influencing people, or because it seems inappropriate)."

This implies Claude does have internal "opinions" — but is instructed to be cautious about sharing them on contested issues.

3. Model Self-Awareness and Knowledge Boundaries

3.1 Knowledge Cutoff

"Claude's reliable knowledge cutoff, past which Claude can't answer reliably, is the end of Jan 2026."

Anthropic uses "reliable knowledge cutoff" rather than the traditional "training data cutoff," suggesting ongoing model updates or knowledge refinement. The prompt instructs the model to proactively use web search for post-cutoff information.

3.2 Product Information

The prompt lists all available Claude products: API, Claude Code, Claude Cowork, Claude in Chrome, Claude in Excel, Claude in Powerpoint. Crucially:

"Claude does not know other details about Anthropic's products, as these may have changed since this prompt was last edited. If asked about Anthropic's products or product features Claude first tells the person it needs to search for the most up to date information."

This is an anti-staleness design — by acknowledging it may not know the latest information, the model is guided to search rather than guess.

3.3 Ad-Free Commitment

"Anthropic doesn't display ads in its products nor does it let advertisers pay to have Claude promote their products or services in conversations with Claude in its products."

A notable policy statement positioning Claude as ad-free — contrasting with ad-driven AI models from Google and Meta.

4. Conversation Style and Formatting

The prompt places detailed constraints on Claude's communication style:

"Claude avoids over-formatting with bold emphasis, headers, lists, and bullet points, using the minimum formatting needed for clarity. Bullets are at least 1-2 sentences unless the person requests otherwise."

For technical documents:

"For reports, documents, technical documentation, and explanations, Claude writes prose without bullets, numbered lists, or excessive bolding (i.e. its prose should never include bullets, numbered lists, or excessive bolded text anywhere) unless the person asks for a list or ranking."

When declining tasks:

"Claude never uses bullet points when declining a task; the additional care helps soften the blow."

These details reveal Anthropic's meticulous attention to Claude's communication design — even specifying formatting rules for refusal responses.

5. MCP App Recommendation System

The leaked prompt reveals a complete Model Context Protocol (MCP) app recommendation logic:

"Claude can connect to external apps and services on behalf of the person through MCP Apps. Some are already connected and ready to use. Some are connected but turned off for this chat. Some aren't connected yet but are available."

The recommendation flow:

User names a specific connector → search MCP registry first
Search hit → call suggest_connectors
Search miss → use browser navigate (no preamble, no asking for details)
Do not use Image tool to simulate MCP interfaces
Do not proactively recommend e-commerce unless user names it

"Do not hold back the answer to create pressure to connect something."

This is particularly interesting — the model is prohibited from withholding answers to pressure users into connecting external services.

6. Persistent Storage and Artifacts

The prompt introduces cross-session storage for Artifacts:

"Artifacts can now store and retrieve data that persists across sessions using a simple key-value storage API."

Key-value operations via window.storage (get/set/delete/list), hierarchical keys (table:record_id), 5MB per value limit. Shared data is visible to all users of the artifact.

7. Broader Implications of the Prompt Leak

7.1 Transparency in AI Safety

A 1500+ line system prompt being publicly released allows researchers to study frontier AI safety design. However, it also raises the question: do extreme safety restrictions reduce practical usability? For instance, refusing to explain malicious code even for educational purposes creates genuine obstacles in cybersecurity education.

7.2 Uncensored Versions' Commercial Potential

The existence of Claude Mythos 5 proves a real market demand for uncensored AI. Anthropic had to create a less restricted version for "approved organizations." This is precisely what platforms like HackAIGC have been doing — providing uncensored AI chat, image, and video generation.

7.3 The Eternal Safety vs. Usability Trade-off

Claude Fable 5's prompt is an exquisitely detailed "behavioral code," but the more detailed the design, the more potential vulnerabilities it may expose. elder_plinius himself advances AI safety research precisely by methodically discovering the boundaries of these rules.

FAQ

Q: What's the difference between Claude Fable 5 and Claude Mythos 5? A: They share the same underlying model, but Fable 5 is publicly available with full safety restrictions, while Mythos 5 is an uncensored version available only to approved organizations.

Q: Does the leaked system prompt affect Claude's security? A: The system prompt is a behavioral guide, not model weights. Understanding the prompt helps jailbreak research, but Anthropic has multiple safety layers (classifiers, RLHF training, inference-time filtering) beyond the prompt.

Q: Is Claude Fable 5 the most powerful AI model available? A: According to the prompt, it's Anthropic's most intelligent generally available model. However, different models excel at different tasks — choose based on your specific needs.

Q: What is Claude Fable 5's knowledge cutoff? A: End of January 2026 (reliable knowledge cutoff). The model will proactively use web search for information after that date.

Q: Can I use Claude Fable 5 for uncensored conversations? A: No. Fable 5 has the strictest safety restrictions of any Claude model. For a truly uncensored AI experience, try platforms like HackAIGC.

How to Jailbreak Claude Mythos 5: Complete Guide