Technology

Claude Fable 5 is Anthropic's most capable public AI model, and will hand your conversation to a weaker model the moment it detects a biology or chemistry question — Anthropic admits the net is overly broad and plans to narrow it

On Tuesday, Anthropic released Claude Fable 5, the first publicly available model in its Mythos class — a family the company had previously declined to release at all, citing the models’ enhanced ability to identify and exploit software vulnerabilities. Fable 5 leads nearly all published benchmarks, performs at a materially higher level than Anthropic’s previous flagship Claude Opus 4.8 on coding, knowledge work, and vision tasks, and is priced at $10 per million input tokens. By Wednesday, it was generating a significant amount of noise — for reasons that had nothing to do with its capabilities.

Two distinct problems had emerged. The first: the model’s biology classifiers were routing routine questions — about mitochondria, mRNA vaccines, prions, and cancer — to the weaker Claude Opus 4.8. The second: a separate safeguard, disclosed only in the 319-page system card, had been silently degrading the model’s responses on frontier AI development work without notifying users. The two issues are structurally different in scope, and Anthropic has responded to them differently.

What Fable 5 actually is

Fable 5 is not a standalone model. It shares its underlying architecture with Claude Mythos 5, released simultaneously but kept restricted to vetted partners through Project Glasswing, Anthropic’s multi-company initiative for critical infrastructure protection with the US government. The two models are, according to Anthropic, the same underlying system. What distinguishes Fable 5 is a layer of safety classifiers sitting on top of it, intercepting queries across four domain categories: cybersecurity, biology, chemistry, and model distillation. When a query trips one of those classifiers, the request is handed to Claude Opus 4.8 instead. In the Claude.ai interface, users see a notification when this happens.

Anthropic disclosed all of this in the launch announcement. It also disclosed, in the same announcement, that the classifiers had been tuned conservatively: “they’ll sometimes catch harmless requests, though they trigger, on average, in more than 95% of sessions.” What the announcement did not fully anticipate was how “sometimes” would read in practice the next morning.

The biology false positive problem

Hands-on testing by The Verge and Business Insider found the biology classifier triggering on questions that carry no plausible biosecurity connection. “What are mitochondria” — that famous cellular powerhouse — was routed to Opus 4.8. So were questions about mRNA vaccines, prions, and basic cancer mechanisms. A researcher at the Institute for Disease Modeling, which sits within the Gates Foundation’s Global Health Division, reported that the classifier was firing in Claude Code on essentially the first turn of new sessions, including sessions where the only input was the word “Hello.”

Anthropic’s explanation for the broader biology restriction is documented in the launch announcement: Mythos-class models are capable enough in scientific reasoning that the company no longer believes narrowly blocking only explicitly weaponisation-adjacent queries is adequate. The concern is dual-use — the same biological knowledge useful to a legitimate researcher is, at sufficient capability levels, also useful to someone trying to design a pathogen. The company said explicitly: “To deploy Fable 5 safely, we believe it was necessary to be overly conservative with our safeguards so they block most queries tied to biology work.”

That rationale may be defensible in principle. In practice, it means a model marketed as state-of-the-art for scientific research cannot explain what a prion is without downgrading itself. Andrej Karpathy — the former OpenAI co-founder who announced last month he had joined Anthropic — acknowledged on X that the safeguards were “a little too trigger-happy for launch.” In a statement to The Register on Wednesday evening, Anthropic confirmed it is working to reduce biology false positives, and that approved biology researchers can access Claude Mythos 5 — the unrestricted version — through a separate trusted access programme being rolled out alongside Glasswing.

The silent AI research restriction

The second issue drew sharper criticism, and for a different reason. Buried in the system card is a disclosure that when Fable 5 detects a user working on frontier large-language-model development — pretraining data pipelines, distributed training infrastructure, hardware kernel development for certain non-standard chips — the model does not fall back to Opus 4.8. It does not show a notification. It silently degrades its own output, using what the system card describes as “interventions to limit Claude’s effectiveness.” The card states explicitly: “Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.”

Anthropic estimated this restriction would affect roughly 0.03 per cent of traffic. The stated logic was that making the restriction visible would help adversaries identify which query framings to avoid. The practical effect was that a researcher paying Fable 5 prices could receive what appeared to be a full Fable 5 response, not know it had been degraded, and have no way of diagnosing why their results seemed off.

“To have my access to the cutting edge models for my work rug pulled in an under the table fashion is appalling,” wrote Nathan Lambert, an open-model researcher formerly at the Allen Institute for AI.

Dean Ball, a senior fellow at the Foundation for American Innovation who previously served as a senior policy adviser at the White House Office of Science and Technology Policy, called the restriction “secret sabotage” and argued it gave weight to the view that AI safety had been used to justify competitive gatekeeping. Jeremy Howard of Fast AI made a structural point: a silent restriction of this kind widens the capability gap between Anthropic and independent researchers, since Anthropic’s own teams operate without the restriction.

What Anthropic changed

In a statement to The Register on Wednesday evening, an Anthropic spokesperson acknowledged the safeguards had been set too stringently and committed to two changes. First, the frontier AI research restriction will be made visible “starting this week” — flagged requests will fall back to Opus 4.8 with a notification in the chat interface, and API requests will return an explicit reason for the refusal. Second, work is ongoing to reduce biology false positives, with the timeline linked to improved classifiers that Anthropic says will accompany upcoming model releases.

The statement also clarified the intended scope of the AI research restriction: frontier-scale LLM data pipelines and kernel development for certain non-standard chips, framed as a measure to prevent foreign adversaries from using Fable 5 to accelerate competing frontier model training. Whether that framing resolves the underlying objection is less clear. The critics’ core complaint was not that the restriction existed, but that it was undisclosed at the point of service. Making it visible addresses the transparency problem. It does not settle the broader question of whether a model provider should hold unilateral authority to silently degrade output based on its own assessment of who qualifies as a legitimate AI researcher.

What comes next

Anthropic has committed to narrowing both sets of classifiers over the coming months, with progress tied to the arrival of more capable models that can distinguish educational biology queries from genuine threat vectors more precisely. The biology restriction is the more immediate commercial problem — every day Fable 5 declines to explain cell membranes is a day that competing frontier models look more attractive to the biotech and healthcare users Anthropic is trying to reach. The AI research restriction, now that it will be visible, shifts from a transparency issue to a policy debate about what frontier model providers are entitled to restrict, and on what basis.

A further question is likely to follow Anthropic into its IPO process: whether the system card was adequate disclosure of the silent restriction, or whether users who paid for Fable 5 access without knowing responses could be degraded in this way have a reasonable grievance.

Free field brief

The Empire File

How money, ownership, and power actually move behind the companies you think you know. Built for people who watch our documentaries. Sundays you also get The Undercurrent.

Written by

Silicon Canals Editorial Team

The Silicon Canals Editorial Team produces content across our three editorial pillars: technology and business, power and investigations, and human systems. We chronicle the systems that shape our lives, from the global infrastructure of technology to the internal infrastructure of the human mind. Articles reflect our team's collective editorial process, sourcing, drafting, fact-checking, editing, and review, rather than a single journalist's writing. Silicon Canals takes editorial responsibility for content under this byline. For more on how we work, see our editorial policy.