top of page

Pattern Recognition at Scale: The Epstein Files

  • Writer: Patrick Duggan
    Patrick Duggan
  • Jan 30
  • 3 min read

How We Applied Threat Intelligence Methodology to the DOJ EFTA Release




The Technology Demonstration


When the DOJ released the Epstein Files Transparency Act (EFTA) documents on January 30, 2026, we saw an opportunity to demonstrate something we do every day at DugganUSA: apply pattern recognition at scale to unstructured data.


This is the same methodology we use to process 400,000+ IOCs for threat intelligence. The domain is different. The techniques are identical.


This is not political commentary. This is a technology demonstration.



What We Captured


Metric — Value

Documents — 37,246

Pages — 42,182 JP2 images

OCR Text — 1,769,895 lines

Download Date — January 30, 2026

Analysis Files Created — 23 structured JSON

Total Analysis Size — 140KB


Forensic Integrity: Our corpus was downloaded directly from the DOJ EFTA release before any files were modified or removed. The OCR text file (COMBINED_ALL_EPSTEIN_FILES_djvu.txt) contains the complete machine-readable corpus.



The Pattern Recognition Methodology


Step 1: Entity Extraction

Same as threat intel - identify named entities:

• People (743 Trump mentions, 15,306 Maxwell mentions)

• Organizations (JPMorgan, Deutsche Bank, Barclays)

• Locations (Palm Beach, NYC mansion, Virgin Islands)

• Financial flows ($148M Leon Black, $46M Wexner)


Step 2: Relationship Mapping

Cross-reference entities to build network graphs:

• Who flew with whom (flight logs)

• Who paid whom (financial records)

• Who was present where (victim statements)


Step 3: Anomaly Detection

Find what doesn't fit:

• Redaction failures (SSN: 090-44-3348)

• Unredacted files included by mistake

• Context that reveals redacted content


Step 4: Epistemic Verification

Cross-reference claims across multiple sources:

• Victim statements ↔ Flight logs ↔ Financial records

• If three independent sources align, confidence increases



How to Use the Search API


The entire corpus is indexed in our Meilisearch instance. Here's how to query it:


# Basic Search


  # Natural Language Search

  curl "https://analytics.dugganusa.com/api/v1/search/nl?q=Who flew with Epstein in 1995"


  # IP/IOC Correlation (for threat intel use cases)


  # Index Statistics


  # Find all Trump references


  # Find flight log references


  # Find financial flows



Key Findings (Technology Validation)


Our pattern recognition identified:


1. Redaction Failures - Including unredacted SSN, emails, account numbers

2. Network Topology - Financial flows between entities

3. Document Organization - Labeled disks with content categories

4. Temporal Patterns - Evidence destruction timeline post-death

5. Cross-Reference Hits - Where multiple sources confirm claims


These are the same patterns we detect in malware networks, C2 infrastructure, and threat actor campaigns. The methodology transfers across domains.



Data Integrity Statement


• Source: DOJ EFTA official release

• Download Window: January 30, 2026

• Preservation: Full corpus archived before modifications

• Chain of Custody: Direct download, local storage, indexed


We cannot speak to what the DOJ has subsequently modified or removed. We can confirm our dataset represents the original release.



Why This Matters (For Technology)


This demonstrates that threat intelligence methodology applies beyond cybersecurity:


1. Unstructured → Structured: 1.77M lines → 23 analysis files

2. Entity Extraction: Named individuals, organizations, locations

3. Relationship Mapping: Network graphs from text

4. Anomaly Detection: Finding what doesn't belong

5. API Accessibility: Making findings searchable


The same stack that processes CVEs and IOCs can process any document corpus.



Available Resources


• Search API: analytics.dugganusa.com/api/v1/search

• STIX Feed: analytics.dugganusa.com/api/v1/stix-feed

• Blog Analysis: dugganusa.com/post/090-44-3348-epstein-ssn-doj-redaction-failures



*DugganUSA is a threat intelligence platform processing 400K+ IOCs consumed by SOCs in 40+ countries. This analysis demonstrates our pattern recognition methodology applied to a new domain.*


*All source data is from publicly released DOJ documents.*


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page