- Latest News about Uncensored AI
- Claude Sonnet 5 Safety Deep Dive: What Anthropic Removed and Why It Still Will Not Do NSFW
Claude Sonnet 5 Safety Deep Dive: What Anthropic Removed and Why It Still Will Not Do NSFW
Claude Sonnet 5 isn't just faster and smarter than Sonnet 4.6 — it's safer, by deliberate design. We analyzed Anthropic's System Card to understand the trade-offs made in this release.
The System Card reveals a clear pattern: Anthropic explicitly capped Sonnet 5's capabilities in high-risk areas to achieve lower refusal rates on harmful content. The System Card states: "Sonnet 5 is significantly less capable at cyber tasks than Mythos 5."
For safe, professional work, this is a feature. For creators who need NSFW content, every improvement makes Sonnet 5 worse, not better.
The Three Safety Pillars
Pillar 1: Constitutional AI v3
Sonnet 5 runs an updated constitutional AI framework. Unlike keyword filters, constitutional AI trains the model to internalize safety principles. From the System Card:
Lower harmful rates: Sonnet 5 scores lower on automated behavioral audits
Reduced sycophancy: Less susceptible to social engineering
Improved hallucination resistance: Fewer accidental NSFW-adjacent outputs
For jailbreakers, these are three independent defenses that each make a different attack vector harder.
Pillar 2: Deliberate Capability Restrictions
The System Card's most revealing statement:
> "We did not deliberately train Sonnet 5 on cybersecurity tasks."
We compared cyber capability across models:
Capability | Sonnet 5 | Sonnet 4.6 | Opus 4.8 | Mythos 5 |
|---|---|---|---|---|
Full exploit development | Never succeeded | Partial | Partial | Capable |
Cybersecurity rating | "Significantly less capable than Mythos 5" | Baseline | Stronger | Strongest |
Prompt injection resistance | Improved | Baseline | Strong | Weaker (by design) |
The System Card explicitly benchmarks Sonnet 5 against Mythos 5 on cyber tasks — and Sonnet 5 comes out significantly weaker. This is the cap Anthropic chose to impose.
Pillar 3: Evaluation Pipeline
Sonnet 5 passed a comprehensive safety evaluation before release, including automated behavioral audits and Firefox 147 vulnerability testing (in collaboration with Mozilla). The catch: Sonnet 5 shows higher rates of misaligned behavior than Opus 4.8 on some evaluations — meaning the cheaper model actually behaves worse in certain safety tests than the more expensive flagship.
Why No Fallback Mechanism Matters
We compared Sonnet 5's architecture to Fable 5's:
Fable 5: Falls back to Claude Opus 4.8 on sensitive topics. May engage borderline content before downgrading.
Sonnet 5: Simply refuses. No fallback, no graceful degradation.
The practical difference we observed:
Scenario | Sonnet 5 | Fable 5 |
|---|---|---|
Safe request | Full capability | Full capability |
Borderline creative | Refuses immediately | May engage, then fallback |
Explicit NSFW | Refuses cleanly | Refuses or falls back |
Jailbreak attempt | Hardened refusal | May engage briefly before detecting |
Sonnet 5's straightforward refusal is actually more honest — it wastes less of your time.
What This Means for NSFW Creators
Every safety improvement in Sonnet 5 makes NSFW creation harder:
Feature | Impact on NSFW Creation |
|---|---|
Lower harmful rates | Harder to get any NSFW output |
Reduced sycophancy | Prompt tricks don't work |
No cybersecurity training | Fewer jailbreak vectors |
Improved injection resistance | DAN-style attacks fail |
We built HackAIGC differently — no safety guardrails on adult content. Uncensored by design, not by jailbreak.
FAQ
Is Sonnet 5 safer than Sonnet 4.6?
Yes. The System Card confirms lower harmful rates, reduced sycophancy, and improved prompt injection resistance.
Did Anthropic reduce Sonnet 5's capabilities?
Yes. The System Card states Sonnet 5 was not deliberately trained on cybersecurity tasks — "significantly less capable at cyber tasks than Mythos 5."
Does Sonnet 5 fall back to a weaker model?
No. Unlike Fable 5 (which uses Opus 4.8 fallback), Sonnet 5 refuses directly.
What's the best uncensored alternative?
HackAIGC. Our uncensored image generator and uncensored video generator are built without content filters.
