We Ran the Numbers Against ThreatFox. 75% of Our Supply-Chain and Research IOCs Aren't There.

Patrick Duggan
Jun 17
4 min read

We ran a cross-reference this week — pulled ThreatFox's seven-day IOC batch and compared it against our own corpus source by source. Not to pick a fight with ThreatFox. They are very good at what they do. The point was to find out honestly where the overlap lives and, more importantly, where it doesn't. The answer surprised us by being as clean as it was.

ThreatFox is a community feed built around command-and-control network indicators: malicious IPs, domains, and URLs tagged to malware families. They are excellent at this. When we ran their 3,141-IOC seven-day batch against our corpus, 81 percent of it was already in our index — because we ingest ThreatFox as a source, and so does most of the community. The network-IOC layer is well-covered across the ecosystem, and we are consumers of that layer alongside everyone else.

The more interesting direction is the reverse: what do we have that ThreatFox doesn't carry? We sampled 265 of our independently-sourced IOCs across six categories and checked each one against ThreatFox's search API. The overall answer was 75 percent. But the per-category breakdown tells a more precise story than the headline number.

Where the 100% lives

GitHub supply-chain hunt (50/50, 100% unique). Every IOC our daily GitHub hunt produces — malware-staging repositories, token-stealers, RAT dropper repos, the binding.gyp command-execution signature we documented in the Phantom Gyp write-up — is absent from ThreatFox. This is not surprising once you understand what ThreatFox indexes: network indicators associated with deployed malware families. GitHub repository URLs are a pre-deployment signal. The campaign is still staging when we catch it. ThreatFox sees the same campaign after the payload deploys and starts phoning home. We are seeing it while the actor is still pushing commits. That is not a knock on ThreatFox — it is a structural difference in what each system is designed to detect.

Research imports — curated adversary campaign data (40/40, 100% unique). These are the IOCs that come from reading vendor threat reports, attribution research, named campaign documentation, and our own Pattern 38-49 supply-chain tracking. Forest Blizzard infrastructure. LiteLLM C2s. Gentlemen ransomware endpoints. The Mastra easy-day-js C2 pair we indexed today. None of them were in ThreatFox at time of check. ThreatFox relies on community submissions tagged to known malware families; named-actor campaign infrastructure from vendor research tends not to flow through that pipeline at the same speed or with the same fidelity.

Shitlist — bulletproof ASN CIDRs (23/23, 100% unique). ThreatFox is an IP-and-domain feed, not a CIDR-based ASN-level blocking tool. Our shitlist carries the /24s and /16s of the truly bulletproof operators — BUCKLOG (AS211590), DMZHOST, STORMINDUSTRIES, 1337 Services, Church of Cyberology — derived from abuse-per-IP density analysis against RIPEstat footprint data. This is a different abstraction than a named indicator. We block the subnet because the business model of the ASN is abuse, not because a specific IP sent a specific malware family's C2 traffic. ThreatFox doesn't operate at this layer.

Honeypot catches (22/22, 100% unique). These are IPs our Cloudflare edge honeypots caught in the act against our own infrastructure — scanning for exposed .env files, WordPress admin paths, actuator endpoints, fake MySQL dumps. Every catch is first-party: it happened to us, we recorded it, it went into the index. ThreatFox by construction indexes community-reported indicators from external-facing malware campaigns, not from individual operators' live honeypots. These will never appear there.

OSV malicious packages (structurally 100% unique). ThreatFox has no package feed. Our 225,000-plus known-malicious npm and PyPI packages from OSV — the ones that compromised the Mastra framework today, the Red Hat Miasma packages last week, the Phantom Gyp wave the week before — live entirely outside anything ThreatFox indexes. If your CI pipeline is checking an IP blocklist to protect against supply-chain attacks, it is checking the wrong thing. Package names are the indicator of compromise in these campaigns, and package names are not a network observable.

Where there's meaningful overlap

SSLBL / JA3 fingerprints (10% unique, 90% in ThreatFox). The SSLBL JA3 fingerprint feed and ThreatFox share significant overlap because both draw from the same abuse.ch infrastructure. This is expected — and it tells us we're catching roughly the same malicious TLS traffic from the same sources as the broader community. When two independent systems agree on an indicator, the confidence goes up; the uniqueness goes down.

OTX pulses (23% unique). We publish to OTX and ThreatFox pulls from it, so overlap here is partly a function of our own dissemination. The 23% that ThreatFox doesn't carry tends to be the more obscure infrastructure — smaller campaigns, early-stage staging, indicators we indexed from vendor research that didn't make it into OTX before the ThreatFox pull window.

What the honest comparison actually says

If you are running a SIEM or XDR and you want C2 network indicators matched to malware families with good tagging, ThreatFox is excellent and you should be using it. We consume it ourselves.

If you want to know which GitHub repositories are staging malware before the campaign deploys, or which npm packages are named-malicious by OSV, or which bulletproof CIDR blocks your edge should never accept traffic from, or what IPs hit your honeypot surfaces in the last hour — ThreatFox structurally cannot give you that, and we can.

The 75% number is real. It's based on 265 sampled IOCs checked live against ThreatFox's search API. The methodology and the raw script are in our repo. We cap our certainty at 95 percent because the sample is a sample, and the composition of which sources happen to be recent will shift the numbers. But the structural gaps — supply-chain repos, package feeds, CIDR-level ASN shitlisting, first-party honeypot intelligence — are not going to close because they represent categories ThreatFox was not designed to cover and, to their credit, does not claim to.

The right frame is not "better than ThreatFox." It is "orthogonal to ThreatFox in the categories where the 2026 attack surface has moved." The supply chain is the frontier. The package is the indicator. The GitHub repo is the staging ground. The bulletproof ASN is the infrastructure. None of that shows up on a community C2 feed, by design.

The threat feed this post is built on

1.14M+ IOCs, STIX 2.1, precursor signals, supply-chain detection. Free API key in 30 seconds.

Get your free key → analytics.dugganusa.com/stix/register

We Ran the Numbers Against ThreatFox. 75% of Our Supply-Chain and Research IOCs Aren't There.

Where the 100% lives

Where there's meaningful overlap

What the honest comparison actually says

Recent Posts

Comments