Pattern Recognition at Scale: The Epstein Files
- Patrick Duggan
- Jan 30
- 3 min read
How We Applied Threat Intelligence Methodology to the DOJ EFTA Release
The Technology Demonstration
When the DOJ released the Epstein Files Transparency Act (EFTA) documents on January 30, 2026, we saw an opportunity to demonstrate something we do every day at DugganUSA: apply pattern recognition at scale to unstructured data.
This is the same methodology we use to process 400,000+ IOCs for threat intelligence. The domain is different. The techniques are identical.
This is not political commentary. This is a technology demonstration.
What We Captured
Metric — Value
Documents — 37,246
Pages — 42,182 JP2 images
OCR Text — 1,769,895 lines
Download Date — January 30, 2026
Analysis Files Created — 23 structured JSON
Total Analysis Size — 140KB
Forensic Integrity: Our corpus was downloaded directly from the DOJ EFTA release before any files were modified or removed. The OCR text file (COMBINED_ALL_EPSTEIN_FILES_djvu.txt) contains the complete machine-readable corpus.
The Pattern Recognition Methodology
Step 1: Entity Extraction
Same as threat intel - identify named entities:
• People (743 Trump mentions, 15,306 Maxwell mentions)
• Organizations (JPMorgan, Deutsche Bank, Barclays)
• Locations (Palm Beach, NYC mansion, Virgin Islands)
• Financial flows ($148M Leon Black, $46M Wexner)
Step 2: Relationship Mapping
Cross-reference entities to build network graphs:
• Who flew with whom (flight logs)
• Who paid whom (financial records)
• Who was present where (victim statements)
Step 3: Anomaly Detection
Find what doesn't fit:
• Redaction failures (SSN: 090-44-3348)
• Unredacted files included by mistake
• Context that reveals redacted content
Step 4: Epistemic Verification
Cross-reference claims across multiple sources:
• Victim statements ↔ Flight logs ↔ Financial records
• If three independent sources align, confidence increases
How to Use the Search API
The entire corpus is indexed in our Meilisearch instance. Here's how to query it:
# Basic Search
# Natural Language Search
curl "https://analytics.dugganusa.com/api/v1/search/nl?q=Who flew with Epstein in 1995"
# IP/IOC Correlation (for threat intel use cases)
# Index Statistics
# Find all Trump references
# Find flight log references
# Find financial flows
Key Findings (Technology Validation)
Our pattern recognition identified:
1. Redaction Failures - Including unredacted SSN, emails, account numbers
2. Network Topology - Financial flows between entities
3. Document Organization - Labeled disks with content categories
4. Temporal Patterns - Evidence destruction timeline post-death
5. Cross-Reference Hits - Where multiple sources confirm claims
These are the same patterns we detect in malware networks, C2 infrastructure, and threat actor campaigns. The methodology transfers across domains.
Data Integrity Statement
• Source: DOJ EFTA official release
• Download Window: January 30, 2026
• Preservation: Full corpus archived before modifications
• Chain of Custody: Direct download, local storage, indexed
We cannot speak to what the DOJ has subsequently modified or removed. We can confirm our dataset represents the original release.
Why This Matters (For Technology)
This demonstrates that threat intelligence methodology applies beyond cybersecurity:
1. Unstructured → Structured: 1.77M lines → 23 analysis files
2. Entity Extraction: Named individuals, organizations, locations
3. Relationship Mapping: Network graphs from text
4. Anomaly Detection: Finding what doesn't belong
5. API Accessibility: Making findings searchable
The same stack that processes CVEs and IOCs can process any document corpus.
Available Resources
• Search API: analytics.dugganusa.com/api/v1/search
• STIX Feed: analytics.dugganusa.com/api/v1/stix-feed
• Blog Analysis: dugganusa.com/post/090-44-3348-epstein-ssn-doj-redaction-failures
*DugganUSA is a threat intelligence platform processing 400K+ IOCs consumed by SOCs in 40+ countries. This analysis demonstrates our pattern recognition methodology applied to a new domain.*
*All source data is from publicly released DOJ documents.*




Comments