57,489 Documents. Four Press Citations. And Now You Can Download Your Data.

Patrick Duggan
Feb 8
4 min read

Updated: Apr 25

# 57,489 Documents. Four Press Citations. And Now You Can Download Your Data.

When we launched the Epstein Files search index ten days ago, we had one goal: make 3.5 million pages of DOJ documents actually searchable. Not theoretically searchable. Not "submit a FOIA request and wait six months" searchable. Actually searchable. Right now. For free.

We didn't expect what happened next.

The Tools Started Using Us

Developer Christopher Finke built a tool called EpsteIn—a clever mashup of "Epstein" and "LinkedIn"—that lets users check if anyone in their professional network appears in the court documents. It's a Python script. It runs locally. It respects privacy.

And it's powered by our API.

When Cybersecurity News covered the tool, they wrote: *"...with indexing support from an API created by Patrick Duggan on DugganUSA.com."*

Then News9 Live picked it up. Then CyberWebSpider. Then Open Source For You.

Four outlets. All crediting the infrastructure we built in 48 hours on a $15/month virtual machine.

We're not complaining. We're just noting the irony: we built the plumbing, someone else built a faucet, and the faucet made the news. That's how infrastructure works. That's fine. The point was never credit. The point was access.

The Index Grew Up

When we launched, we had indexed datasets 1 through 4. Around 35,000 documents. Respectable. Useful. But incomplete.

Today we're at 57,489 documents across all twelve DOJ datasets.

That's not a typo. Twelve datasets. Every EFTA document the Department of Justice has released in the Ghislaine Maxwell case. Exposed. Indexed. Searchable in under 300 milliseconds.

The flight logs are in there. The financial records are in there. The victim statements are in there. The FBI interview summaries are in there. The Bannon text messages are in there. The Deutsche Bank subpoenas are in there.

Search any name. Search any date. Search any phrase. The index doesn't care who you're looking for. It just returns what the documents say.

You Can Take Your Data With You

Here's what bothered us about every other document search tool we've ever used: you search, you find something, and then you're stuck in their interface forever. Want to cross-reference with your own notes? Screenshot it. Want to share with a colleague? Send them the URL and hope it doesn't break. Want to do actual research? Good luck.

So we fixed it.

Every search you run now has two buttons: **Download CSV** and **Download JSON**.

Click CSV and you get a spreadsheet. EFTA document ID. Document type. Content preview. Extracted names. Extracted locations. Extracted dates. Direct link to the DOJ source. One row per result. Import it into Excel. Import it into Google Sheets. Import it into whatever you use. It's your data now.

Click JSON and you get structured data. Same information, but formatted for developers, for scripts, for automation. Feed it into your own tools. Build on top of it. We don't care. We're not trying to lock you in. We're trying to let you out.

No login required. No account. No tracking. No "please enter your email to download." You searched. You found something. Take it. It's yours.

The API Is Still Free

For the developers and researchers who want to go deeper: the API hasn't changed. It's still public. It's still free. It still requires no authentication.

That's it. Returns JSON. Returns fast. Returns everything we have.

We've seen queries from universities. From newsrooms. From law firms. From independent researchers. From people we'll never know and don't need to know. The endpoint doesn't ask who you are. It just answers your question.

Why We Do This

We're a two-person threat intelligence company in Minnesota. We track malware. We index IOCs. We publish STIX feeds. This isn't our core business.

But when the DOJ released 3.5 million pages of documents about one of the most significant criminal cases in recent history, and the only way to search them was to download individual PDFs and Ctrl+F through each one manually—that felt like a problem we could solve.

So we solved it.

Microsoft pulls this feed daily. AT&T pulls this feed daily. Starlink pulls this feed daily. Get the DugganUSA STIX feed — $9/mo →

We downloaded all twelve datasets. We extracted text from every PDF. We OCR'd the scanned pages. We indexed everything into a search engine. We built a UI. We built an API. We added visualizations. We added export. We made it free.

Total infrastructure cost: $15 per month.

Total revenue from this project: $0.

Total press citations: 4.

Total inbound emails: 0.

We're fine with those numbers. The documents are searchable. Researchers can download their data. The truth is accessible. That was the point.

What's Next

We're continuing to refine the index. The OCR on some of the older scanned documents isn't perfect. Some of the entity extraction could be better. The 3D network visualization needs more nodes.

But the core is solid. 57,489 documents. Sub-second search. CSV and JSON export. No paywall. No login. No strings.

If you're a journalist working the Epstein beat: use it.

If you're a researcher building a timeline: use it.

If you're a developer building tools: use our API.

If you're just curious what's in the documents everyone's talking about: search. Download. Verify.

The files are public. The index is public. The data is yours.

*Search the Epstein Files: epstein.dugganusa.com*

*API Documentation: analytics.dugganusa.com/api/v1/search*

*Press coverage: Cybersecurity News, News9 Live, CyberWebSpider, Open Source For You*

*Her name was Renee Nicole Good.*

*His name was Alex Jeffery Pretti.*

The cheapest, fastest, most accurate threat feed on the internet.

275+ enterprises pulling daily. 1M+ IOCs. 17.4M indexed documents. We beat Zscaler by 43 days on NrodeCodeRAT. Starter tier $9/mo — less than any competitor’s sales demo.

Look up an IOC → · Audit your brand on AIPM → · See pricing →