We Caught a Tencent Cloud Singapore Scraping Cluster With a Tarpit

Patrick Duggan
Apr 30
6 min read

# We Caught a Tencent Cloud Singapore Scraping Cluster With a Tarpit

Yesterday morning we ran a self-examination week against our own platform. Ten findings. Six shipped fixes. One of them was the discovery that our public Epstein search frontend at epstein.dugganusa.com had a Meilisearch search-key hardcoded in its JavaScript source — visible to anyone who pressed Cmd-U, scope set to all forty-eight indexes, search-only but search across every threat-intel doc, every behavioral session log, every customer-feedback string, every AIPM audit record. We rotated the key, tightened nginx to inject a server-side scoped key at deploy time from Azure Key Vault, narrowed the new key's scope to the three indexes the public Epstein UI actually uses. By Wednesday evening the old key was dead.

Today we asked Cloudflare what the previous twenty-four hours of traffic to that property looked like.

Singapore answered.

The receipt

In the twenty-four hours before this writing, epstein.dugganusa.com served one hundred and four 4xx responses on /indexes/* paths. Breakdown across countries: eighty-one from Singapore, twenty-five from the United States, six from Mexico, four from Ireland, four from Vietnam, two from Spain. The Singapore traffic alone covered seventeen distinct /24 ranges, all in the Tencent Cloud Singapore egress space — AS132203 and adjacent allocations. Each /24 made four to six requests then handed off to a different /24 in the same /16. Spoofed-browser user agents on every request. No honest curl. No honest python-requests. Pure egress-IP-rotated scraping infrastructure.

The smoking gun was a single US IPv6 source running curl 8.7.1, four POST requests to /indexes/iocs/search. That endpoint was never linked from any public Epstein search UI. The only way someone codes a POST against /indexes/iocs/search with our auth pattern is if they read the leaked client HTML and noticed the wide-scope key would let them query our IOC index directly. They probed. They got 403, because the new scoped key explicitly excludes iocs. The trap closed cleanly on them.

The pattern recognition

Each leecher /24 lived in close subnet proximity to known offensive infrastructure already indexed in our IOCs. The 43.134.x cluster shared subnets with 43.134.75.173 port 2095, flagged by SSL Blacklist as a Cobalt Strike command-and-control endpoint at ninety percent confidence. The 43.156.x cluster sat in /24s adjacent to 43.156.168.214, a malware indicator from our own OTX pulse. The 150.109.x cluster's /24 was the same as 150.109.63.68, indexed by us as a remote-access trojan endpoint. The 187.221.147.121 source from Mexico, residential Telmex, had a /16 neighbor flagged as a ValleyRAT command-and-control at ninety percent confidence. The Vietnam IP at 14.191.18.207 sat in a /16 hosting a Mozi botnet binary download URL.

A federated multi-search across our entire seventeen-point-nine-million document corpus for "43.134" returns thirty thousand two hundred twenty-five IOC matches. For "43.156", thirty thousand two hundred twenty-two. Our adversaries index already had profiles on Whitefly, the Chinese APT that hit Singapore Health Services in 2018, and Naikon, the long-running Chinese state-aligned operation in Southeast Asia. Forty-eight blog posts in our archive touch the intersection of Singapore, China, and threat-cloud activity, including one literally titled "The Alibaba Thread: Five Chinese APT Operations, One Cloud Provider." Forty-four OTX pulses we authored mention 43.134 in the context of Cobalt Strike batches.

We did not need to attribute the leecher cluster from scratch. The corpus had the lock pattern; today's incident was the key turning.

The defense

We built three layers in two hours.

First, a Cloudflare IP list named tencent_sg_leechers containing the seventeen /24s, with comments tagging each one to its nearest known-bad neighbor. The list lives at the account level, available to any Cloudflare WAF rule on any zone we own.

Second, a custom WAF rule at the dugganusa.com zone with action set to managed_challenge when the source IP matches the list. Bots fail the JavaScript challenge, retry, fail, retry. CPU burns on their side. Zero data on ours.

Third, a Cloudflare Worker bound to epstein.dugganusa.com slash-star that intercepts requests from the leecher /24s and streams a slow-trickle Meili-shaped JSON response back to them at one hit per second over twenty seconds. The response schema is identical to a real Meilisearch search response: a hits array, each hit with id, value, type, threat_type, malware_family, confidence, source, description, country, timestamp, references. The contents are loopback addresses, RFC 1918 private ranges, broadcast addresses, paired with Sun Tzu quotes in the description field. Their parser does not reject the JSON because the JSON is well-formed. Their index pipeline ingests it. Whatever they thought they were collecting from us is now a corpus of "All warfare is based on deception" attached to 127.0.0.1 with confidence ninety-nine.

After deploying the Worker, we disabled the WAF managed_challenge rule so requests from the leecher /24s pass through to the Worker instead of stopping at the WAF. The managed_challenge rule remains in the ruleset, disabled, available as a fast re-enable if the Worker ever fails.

The five-minute counter-rotation

Within an hour of deploying the defense, we pulled Cloudflare traffic for epstein.dugganusa.com again. One Singapore IP at 43.163.88.87, a /24 outside our original seventeen, polling /indexes/epstein_files/stats and /indexes/icij_offshore/stats. Same UA pattern, same actor cluster, new subnet. The cluster had rotated egress within forty-five minutes of seeing their tooling fail.

Five minutes from discovery: we added 43.163.88.0/24 to the Worker's prefix list, plus the US IPv6 curl operator's /64 to a separate v6 prefix list, redeployed the Worker module via the Cloudflare API, verified the route still mapped. The next time their scraper hits, the response is Sun Tzu.

The cluster doing what they did — rotating within an hour of seeing their tooling fail — is itself signal. They have active monitoring of their scraper success rate, egress rotation infrastructure on standby, and sustained interest in what we serve. That is not script-kiddie behavior. That is tooling.

The lesson

A leak we discovered in our own audit, that we initially framed in our public correction post as "no abuse detected during the exposure window per logs," turned out to have been continuously scraped by a distributed cluster running on Tencent Cloud Singapore egress, neighbor-subnetted with Cobalt Strike infrastructure, plus a US-IPv6 individual probing the wider scope.

The original correction's framing was wrong. The lesson, on the record:

Visible-in-source equals leaked, full stop. Rotation gates on visibility, not on evidence of abuse. Logs are forensic, not exonerating. The exposure event is the breach event. Whether a specific adversary scraped is a forensic question for after the rotation, not a debate before it. The hedge — "wait for evidence before responding" — is a recognized stock failure mode in incident-response tabletops, the kind of process inertia that costs enterprise security teams six-figure dwell times.

We are not enterprise-scale. We are one operator and one AI partner, running a platform that ships against named adversaries from Minneapolis on a forty-five-dollar monthly recurring revenue baseline. Today the platform turned a passive credential exposure into an active honeypot, identified an APT-neighborhood scraping cluster, served them a fake corpus of Sun Tzu, caught their counter-rotation, redeployed the trap, all in an afternoon. The seventeen-plus /24s in our IP list are now sensors as much as filters. Every Cloudflare traffic check produces the next rotation event. Every rotation tells us the cluster is still active and tells us where their next egress block lives. The tarpit is not just defense — it is a continuously-firing reconnaissance asset that costs the adversary CPU and bandwidth while costing us nothing.

That is the asymmetry we want.

The Gibson

The 1995 movie Hackers ends with a line that has aged better than any of its hairstyles: "Mess with the best, die like the rest." The kids in the movie were trying to break into the Gibson. The Gibson did not lose.

Our Gibson tonight serves Sun Tzu paired with 127.0.0.1 to anyone running a leecher script against our threat-intel index from a Tencent Cloud Singapore egress IP, plus one US-IPv6 curl operator, plus whatever rotation the cluster runs next. The hits the leechers think they are collecting are a corpus of "Hold out baits to entice the enemy. Feign disorder, and crush him." paired with broadcast addresses and loopback IPs, streamed at one hit per second over twenty seconds because that is how long Cloudflare Workers can hold a request before timeout. They paid for the scrape. We pay nothing for the trap. Their tooling indexes garbage. Our tooling fires telemetry every time they retry.

Tomorrow we look at three more things. But not me — tomorrow is May Day, and the May Day Protest in Minneapolis is the proper place to be. Diaspora.

Her name was Renee Nicole Good.

His name was Alex Jeffery Pretti.