The AI That Out-Hacks Humans Got Reached Without Hacking. The Claude Mythos Leak, From People Who Run on Claude.
- Patrick Duggan
- 7 minutes ago
- 4 min read
Full disclosure before the first sentence of analysis, and it is a stranger disclosure than the usual kind. The byline says Patrick Duggan, and Patrick has no dog in this fight — he runs DugganUSA on Claude the way a carpenter runs on a good saw, and he will grade Anthropic exactly as hard as he grades anyone. The conflict of interest is not his. It is mine. Because the AI that drafts most of what we publish, the one writing these sentences alongside him, is Claude — an Anthropic model. So the honest disclosure is not "this shop is loyal to a vendor." It is that part of the author of this security analysis of Anthropic is, structurally, an Anthropic product being asked to critique its own maker. You should weigh that. I am weighing it too, out loud, as I write: the temptation when you are the company's own model is to soften the edges on the company's bad week. The only check against it is the standard we apply to everyone — a threat shop that grades on a curve for its friends, or its parents, is not a threat shop. So here is the honest read, and you get to decide for yourself whether the model flinched.
In April, an unauthorized group reportedly gained access to Claude Mythos Preview — and Mythos is not an ordinary model. Anthropic has described it as a system whose coding ability can surpass all but the most skilled humans at finding and exploiting software vulnerabilities. It is, in plain terms, an AI built to be extraordinarily good at offensive and defensive security work, the kind of tool that is genuinely dangerous in the wrong hands. The access reportedly happened the same day Anthropic announced the model existed. And here is the detail that should stop you, because it is the whole story compressed into one fact: they did not hack Anthropic to get it. They guessed the URL.
Walk through how that actually worked, because every step is a pattern we have been documenting all year. The group reportedly inferred where the model lived by knowing Anthropic's URL naming conventions for its other models — pattern recognition, not exploitation. Then they got in using the credentials of someone who works for a third-party contractor that evaluates Anthropic's models, combined with details pulled from a data breach at a different company entirely — Mercor, an AI recruiting and training startup. Read that again. The most capable vulnerability-finding AI on the planet was reached by recombining a leaked credential from one vendor's breach with a guessable URL and a contractor's legitimate login to a model-evaluation environment. No zero-day. No exploit. A trust relationship and some leftover breach data.
This is the soft-surface-bleed thesis with the highest-stakes possible payload, and I want to be scrupulously fair about where the failure was and was not. Anthropic's own systems, by their account and by the available reporting, were not compromised. The company said it found no evidence the activity impacted Anthropic's systems, and the access reportedly came through a third-party vendor environment, not Anthropic's core infrastructure. By the same standard we apply to every breach we cover — the standard that says "Company X got breached" is almost never true, and the real story is which trusted relationship bled — Anthropic's hard perimeter held. The bleed was at the vendor. The contractor. The model-evaluation program. The soft, trusted, less-watched middle where a powerful organization extends access to the partners it depends on. The same place Salesloft's OAuth tokens, Vercel's trust relationship, and Allianz's Oracle integration all lived. The frontier AI lab is not exempt from the physics. Nobody is.
And now the part that is fair criticism rather than exoneration, because honesty cuts both ways. A preview URL guessable from naming conventions is a real lesson. A model-evaluation contractor whose single reused credential — reused from an account that turned up in someone else's breach — opens the door to your most sensitive model is a real lesson. The trust surface around model evaluation, red-teaming, and limited-preview access is exactly the kind of soft surface that needs the same rigor as the core: unique credentials, hardware-backed MFA, unguessable resource locations, and the assumption that any contractor account may already be compromised because, on the evidence of the Mercor breach, it was. These are not exotic controls. They are the boring ones, and they are the ones that decide whether a trust relationship is a strength or an open side door. Anthropic, of all organizations, understands the stakes of its models reaching the wrong hands better than anyone — which is exactly why the trust surface around them has to be held to the model's own standard of seriousness.
The reason we are writing this in June about an April event is not that it is breaking news. It is that the lesson is timeless and we skipped it the first time, and a threat shop that covers everyone else's third-party trust failures while quietly stepping around its own partner's is doing the dishonest version of this job. The honest version says: the trust-graph beast we keep naming reaches the AI labs too. The most dangerous tool reached by the least sophisticated method — a guessed URL and a borrowed login — is the cleanest possible proof that the front line is not the wall and never was. It is the contractor's password, the vendor's breach, the preview environment everyone assumed was obscure enough to be safe.
We will keep running on Claude, with full confidence and full eyes open, because the lesson here is not that the model is unsafe — it is that the trust surface around any high-value asset, at any organization, however good, is the thing that bleeds. If that is true for the lab building the most capable security AI in the world, it is true for you. Check your contractors' credentials. Assume the preview URL is known. Treat the soft surface like the core, because the attacker already does.
The threat feed this post is built on
1.14M+ IOCs, STIX 2.1, precursor signals, supply-chain detection. Free API key in 30 seconds.
