Pattern Recognition at Scale: The Epstein Files

Patrick Duggan
Jan 30
3 min read

Updated: Apr 25

How We Applied Threat Intelligence Methodology to the DOJ EFTA Release

The Technology Demonstration

When the DOJ released the Epstein Files Transparency Act (EFTA) documents on January 30, 2026, we saw an opportunity to demonstrate something we do every day at DugganUSA: apply pattern recognition at scale to unstructured data.

This is the same methodology we use to process 400,000+ IOCs for threat intelligence. The domain is different. The techniques are identical.

This is not political commentary. This is a technology demonstration.

What We Captured

Metric — Value

Documents — 37,246

Pages — 42,182 JP2 images

OCR Text — 1,769,895 lines

Download Date — January 30, 2026

Analysis Files Created — 23 structured JSON

Total Analysis Size — 140KB

Forensic Integrity: Our corpus was downloaded directly from the DOJ EFTA release before any files were modified or removed. The OCR text file (COMBINED_ALL_EPSTEIN_FILES_djvu.txt) contains the complete machine-readable corpus.

The Pattern Recognition Methodology

Step 1: Entity Extraction

Same as threat intel - identify named entities:

• People (743 Trump mentions, 15,306 Maxwell mentions)

• Organizations (JPMorgan, Deutsche Bank, Barclays)

• Locations (Palm Beach, NYC mansion, Virgin Islands)

• Financial flows ($148M Leon Black, $46M Wexner)

Step 2: Relationship Mapping

Cross-reference entities to build network graphs:

• Who flew with whom (flight logs)

• Who paid whom (financial records)

• Who was present where (victim statements)

Step 3: Anomaly Detection

Find what doesn't fit:

• Redaction failures (SSN: 090-44-3348)

• Unredacted files included by mistake

• Context that reveals redacted content

Step 4: Epistemic Verification

Cross-reference claims across multiple sources:

• Victim statements ↔ Flight logs ↔ Financial records

• If three independent sources align, confidence increases

How to Use the Search API

The entire corpus is indexed in our Meilisearch instance. Here's how to query it:

# Basic Search

curl "https://analytics.dugganusa.com/api/v1/search?q=YOUR_QUERY"

# Natural Language Search

curl "https://analytics.dugganusa.com/api/v1/search/nl?q=Who flew with Epstein in 1995"

# IP/IOC Correlation (for threat intel use cases)

curl "https://analytics.dugganusa.com/api/v1/search/correlate?q=INDICATOR"

# Index Statistics

curl "https://analytics.dugganusa.com/api/v1/search/stats"

# Find all Trump references

curl "https://analytics.dugganusa.com/api/v1/search?q=trump&indexes=epstein_files"

# Find flight log references

Microsoft pulls this feed daily. AT&T pulls this feed daily. Starlink pulls this feed daily. Get the DugganUSA STIX feed — $9/mo →

curl "https://analytics.dugganusa.com/api/v1/search?q=flight+log+727&indexes=epstein_files"

# Find financial flows

curl "https://analytics.dugganusa.com/api/v1/search?q=million+payment&indexes=epstein_files"

Key Findings (Technology Validation)

Our pattern recognition identified:

1. Redaction Failures - Including unredacted SSN, emails, account numbers

2. Network Topology - Financial flows between entities

3. Document Organization - Labeled disks with content categories

4. Temporal Patterns - Evidence destruction timeline post-death

5. Cross-Reference Hits - Where multiple sources confirm claims

These are the same patterns we detect in malware networks, C2 infrastructure, and threat actor campaigns. The methodology transfers across domains.

Data Integrity Statement

• Source: DOJ EFTA official release

• Download Window: January 30, 2026

• Preservation: Full corpus archived before modifications

• Chain of Custody: Direct download, local storage, indexed

We cannot speak to what the DOJ has subsequently modified or removed. We can confirm our dataset represents the original release.

Why This Matters (For Technology)

This demonstrates that threat intelligence methodology applies beyond cybersecurity:

1. Unstructured → Structured: 1.77M lines → 23 analysis files

2. Entity Extraction: Named individuals, organizations, locations

3. Relationship Mapping: Network graphs from text

4. Anomaly Detection: Finding what doesn't belong

5. API Accessibility: Making findings searchable

The same stack that processes CVEs and IOCs can process any document corpus.

Available Resources

• Search API: analytics.dugganusa.com/api/v1/search

• STIX Feed: analytics.dugganusa.com/api/v1/stix-feed

• Blog Analysis: dugganusa.com/post/090-44-3348-epstein-ssn-doj-redaction-failures

*DugganUSA is a threat intelligence platform processing 400K+ IOCs consumed by SOCs in 40+ countries. This analysis demonstrates our pattern recognition methodology applied to a new domain.*

*All source data is from publicly released DOJ documents.*

The cheapest, fastest, most accurate threat feed on the internet.

275+ enterprises pulling daily. 1M+ IOCs. 17.4M indexed documents. We beat Zscaler by 43 days on NrodeCodeRAT. Starter tier $9/mo — less than any competitor’s sales demo.

Look up an IOC → · Audit your brand on AIPM → · See pricing →