Back to Phantom Notes
AI Models

Claude Fable 5: Anthropic's Most Powerful Public Model Lands at #1, Benchmarks and Pricing

June 10, 20268 min readBy T.W. Ghost
ClaudeFable 5Mythos 5AnthropicAI ModelsBenchmarksCodingAgentsClaude Code

Release Summary

On June 9, 2026, Anthropic made Claude Fable 5 generally available. It is the most capable model the company has ever shipped to the public, and it is the production version of the same weights behind the restricted Claude Mythos 5. The headline is not a single score, it is a step change: Fable 5 took the #1 spot on the Artificial Analysis Intelligence Index at 64.9, nearly five points clear of any other lab's best model and roughly three and a half points ahead of Anthropic's own Claude Opus 4.8.

The model ID is claude-fable-5. It is available on the Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing is $10 per million input tokens and $50 per million output tokens, with a 1M-token context window and up to 128K output tokens per request.

The short version: a clear lead on coding and knowledge-work benchmarks, a model built to run autonomously for days, a step up in vision, and a genuinely new wrinkle for developers, safety classifiers that can decline a request.


The Headline: A New #1, By a Wide Margin

The Artificial Analysis Intelligence Index aggregates ten evaluations, so no single test decides the top spot. That is what makes the margin here notable.

ModelAA Intelligence Index
Claude Fable 564.9
Claude Opus 4.861.4
GPT-5.5 (xhigh)60.2
Gemini 3.5 Flash55.0

Fable 5 set the highest score on 5 of the 10 underlying benchmarks. The gap to the best non-Anthropic model, GPT-5.5, is nearly five points, which is unusually wide for this index. For context, Opus 4.8 retook #1 just twelve days earlier with a 1.2-point edge. Fable 5 is a different class of jump.


Benchmarks

Here is how Fable 5 lands against Anthropic's own Opus 4.8 and the two competing flagships. Numbers are from Anthropic's release reporting and Artificial Analysis.

BenchmarkFable 5Opus 4.8GPT-5.5Gemini 3.5 Flash
SWE-bench Verified (coding)95.0%88.6%82.6%80.6%
SWE-bench Pro (coding)80.3%69.2%58.6%54.2%
GDPval-AA (Elo, economic work)1,9321,8901,7691,656
AA Intelligence Index64.961.460.255.0
FrontierCode Diamond (hard coding)29.3%---

Three things stand out.

SWE-bench Pro is the number that matters most for working developers. Fable 5 posts 80.3%, more than eleven points clear of Opus 4.8 and more than twenty points clear of GPT-5.5. On the more saturated SWE-bench Verified it hits 95.0%, the first widely available model to break into the mid-90s on that test. The harder the coding test, the larger the lead.

GDPval-AA is the quiet blockbuster. Coding benchmarks measure coding. GDPval-AA, Artificial Analysis's Elo for real economic-value knowledge work, puts Fable 5 in front at 1,932. That is the strongest single signal that the #1 ranking is not a quirk of one eval.

On some benchmarks Fable 5 scored more than 10% higher than Opus 4.8. That is a meaningful generational gap for a model released less than two weeks after Anthropic's previous flagship.


What It Was Actually Built For

The benchmark story is real, but Fable 5's design goal is long-horizon autonomy. Anthropic positions it as a model that can work for days at a time inside an agent harness, testing its own output and reflecting on its work as it goes, rather than grinding through a long task in a single pass.

The launch evidence is concrete. Stripe, an early tester, reported that Fable 5 compressed months of engineering into days. In one case the model performed a codebase-wide migration across a 50-million-line Ruby codebase in a single day, work Anthropic says would have taken a team more than two months by hand.

That is the practical thesis: not a smarter chatbot, but a model you can hand an ambitious, multi-day project and check in on later.


Vision Got a Real Upgrade

Fable 5 is state-of-the-art on vision as well as coding. The specific gain is understanding diagrams, charts, and tables nested inside files and PDFs, the visual content that document-heavy work actually runs on. That opens up finance, legal review, analytics, architecture, and gaming workflows where the answer lives in a figure or a table, not the surrounding text.

Get the Weekly IT + AI Roundup

What changed this week in NinjaOne, ServiceNow, CrowdStrike, and AI. One email, every Monday.

No spam, unsubscribe anytime. Privacy Policy


The New Wrinkle: Refusals and Fallback

This is the headline change for developers, and it is worth understanding before you point production traffic at the model.

Fable 5 ships with safety classifiers that can decline certain requests, concentrated in cybersecurity and biology. When that happens, the Messages API returns stop_reason: "refusal" as a successful HTTP 200 response, not an error, along with the classifier that declined it. Anthropic says reroutes to a conservatively tuned Opus 4.8 trigger in under 5% of sessions on average, and you are not charged Fable pricing for a rerouted request.

If you call Fable 5 directly, three things change:

  • Response handling. Check stop_reason before reading the content. A pre-output refusal returns an empty content array and is not billed at all; a mid-stream refusal bills the output already streamed, which you should discard.
  • Fallback. A refused request can usually be served by another Claude model. You can retry server-side with the fallbacks parameter, client-side with the SDK middleware, or manually with a fallback-credit token that refunds the prompt-cache cost of switching.
  • Billing and data retention. Fable 5 requires 30-day data retention and is not available under zero data retention, so it returns a 400 for organizations configured below that.

The honest caveat: early reports, including from The Register, flagged the classifiers refusing some innocuous prompts. Benign security tooling and life-sciences work can trip a false positive. If your workload lives near those domains, build the fallback path on day one rather than bolting it on after the first surprise refusal.


Mythos 5 Is the Same Model

Claude Mythos 5 is the same underlying weights as Fable 5, with the safeguards lifted in some areas. It is not generally available. It ships in limited release through Project Glasswing, aimed at a small group of cyberdefenders and infrastructure providers, in collaboration with the US government. If you do not have Glasswing access, Fable 5 is the model you use, and it offers the same core capabilities. The pricing, context window, and API surface are identical.


A Few API Differences to Know

Fable 5 is not a drop-in for Opus-tier code. The Messages API behaves differently in three ways:

  • Adaptive thinking is always on. There is no way to disable it. Omit the thinking parameter, and control depth with the effort setting instead. An explicit thinking: {type: "disabled"} returns a 400.
  • Raw reasoning is never returned. You get summarized thinking blocks if you ask for them, never the raw chain of thought. Pass thinking blocks back unchanged across turns on the same model.
  • A new tokenizer. The same content tokenizes to roughly 30% more tokens than on Opus-tier models, so re-baseline your token counts and max_tokens rather than reusing old numbers.

Adaptive thinking, the effort parameter, task budgets, the memory tool, code execution, programmatic tool calling, context editing, compaction, and vision are all supported at launch.


Should You Use It?

The question is no longer "is it better," it is "is it worth the price and the slower turn." Fable 5 lists at $10/$50 per million tokens, double Opus 4.8's $5/$25, and it thinks more deeply, which means longer individual requests.

  • Long-running agentic coding: Yes. The SWE-bench Pro lead, the days-long autonomy, and the self-validation are exactly what this model is for. This is the clearest win.
  • Hard knowledge work and research: Yes. The GDPval-AA lead and the vision gains land directly on finance, legal, and analytics deliverables.
  • High-volume, latency-sensitive, or cost-sensitive work: Probably not. Opus 4.8 at half the price and faster turns is the better default for everyday tasks, and Fable 5 is overkill for simple prompts.
  • Workloads near security or biology: Test carefully. The classifiers can refuse, so wire up fallback before you commit.

The clean framing: Opus 4.8 is the workhorse, Fable 5 is the specialist you reach for when the task is genuinely hard, genuinely long, or genuinely high-stakes, and you are willing to pay for the best result available.


What to Watch Next

  • Do the false-positive refusals settle down? Anthropic will tune the classifiers, and how fast the innocuous-prompt complaints fade will decide how much friction the safety layer adds in practice.
  • How far does multi-day autonomy actually scale? The Stripe migration is a striking data point. The real test is whether ordinary teams see the same on their own codebases over the coming weeks.
  • How does the price-to-value math play out? At $10/$50, Fable 5 needs to save real engineering time to justify itself over Opus 4.8. The breakeven is the conversation every team will have.

Claude Fable 5 is available now at claude-fable-5. It is the most capable model the public can use today, it tops the aggregate Intelligence Index by a wide margin, and it comes with a new safety layer you need to design around. For the hardest, longest, highest-stakes work, it is the new high-water mark.


*Not sure which model fits your workflow? Our model comparison breaks down Claude, ChatGPT, Gemini, and Grok side by side, or take the quiz for a personalized match.*