We Scanned 136,717 Government Documents for PII the DOJ Forgot to Redact
- Patrick Duggan
- Feb 23
- 6 min read
# We Scanned 136,717 Government Documents for PII the DOJ Forgot to Redact
**They blacked out the names. They left everything else.**
The Department of Justice released hundreds of thousands of pages from the Jeffrey Epstein investigation. They redacted names. They sealed sections. They stamped [RESTRICTED] across paragraphs they didn't want you to read.
They forgot about the Social Security numbers. And the bank accounts. And the phone numbers. And the passwords.
What We Did
We built a DLP scanner — Data Loss Prevention, the same technology every Fortune 500 company runs on outbound email to catch employees leaking sensitive data — and pointed it at the DOJ's own document release.
One script. Fifteen regex patterns. 136,717 documents. Every hit classified as either an **exposure risk** (PII the government is accidentally serving to the public) or a **resolution signal** (PII that reveals who was redacted).
The scan took 47 minutes.
What We Found
**101,012 PII findings** across 136,717 government-released documents:
| Pattern | Hits | Severity |
|---------|------|----------|
| Bank account numbers | 40,681 | High |
| US phone numbers | 32,967 | High |
| Email addresses | 17,983 | High |
| Routing numbers | 6,463 | Medium |
| Passwords | 1,318 | Critical |
| Credit card numbers | 684 | Critical |
| Dates of birth | 542 | High |
| SWIFT/BIC codes | 188 | High |
| Social Security numbers | 136 | Critical |
| Username/password combos | 30 | Critical |
| Driver's license numbers | 15 | Critical |
| International phone numbers | 3 | High |
| Passport numbers | 2 | Critical |
**2,168 critical-severity findings.** Social Security numbers. Credit cards. Login credentials. Passport numbers. In documents the United States government chose to release to the public.
**1,362 findings within 500 characters of a redaction marker.** That's the number that matters.
The Bidirectional Trick
DLP patterns work in two directions:
**Defense:** These are PII values our search engine is currently serving. Social Security numbers in search results. Bank account numbers returned by API queries. We masked them.
**Offense:** When a Social Security number appears 200 characters away from a [REDACTED] name, the SSN tells you who was redacted. When an email address sits next to a blacked-out paragraph, the email *is* the identity. The government's own DLP failure becomes a resolution engine.
Of the 101,012 findings:
- **99,650** are pure exposure risks — PII the government is leaking
- **1,334** are resolution signals — PII that can unmask redactions
- **28** are both
Those 1,334 resolution signals are breadcrumbs the redactors left behind. They blacked out the name but left the Social Security number in the next sentence. They sealed the identity but the email address is right there in the CC line.
Why This Matters
The government had one job: protect identities in a document release involving sex trafficking, financial fraud, and intelligence operations spanning decades. They used redaction markers. They sealed sections. They restricted access.
They did not run a DLP scan.
Any compliance officer at any mid-size company would have caught this. It's a checkbox on every SOC 2 audit. It's built into Microsoft 365. It's a feature in Gmail. The technology to prevent this has existed for twenty years.
136 Social Security numbers are sitting in publicly released DOJ documents right now.
The CARVER Scores
We also scored every person of interest using CARVER — a military targeting framework the Department of Defense uses for critical infrastructure analysis. Six dimensions, each scored 1-5:
- **C**riticality — How central to the network?
- **A**ccessibility — Can investigators reach them?
- **R**ecuperability — Can the network survive their exposure?
- **V**ulnerability — How exposed are they in the documents?
- **E**ffect — What happens to the network if they fall?
- **R**ecognizability — Can the public identify them?
**Maximum possible: 30.** Here's who scored highest:
The 28 Club (93% — one point below the ceiling)
| Target | Score | V | Doc Hits |
|--------|-------|---|----------|
| Robert Maxwell | 28 | 3 | 1,036 |
| Jeff Bezos | 28 | 3 | 1,041 |
| Peter Thiel | 28 | 3 | 1,060 |
| Prince Andrew | 28 | 3 | 1,097 |
| Tom Pritzker | 28 | 3 | 1,034 |
| John Brockman | 28 | 3 | 1,070 |
| Michael Wolff | 28 | 3 | 1,033 |
| Robert Lighthizer | 28 | 3 | 1,036 |
| Steven Mnuchin | 28 | 3 | 1,088 |
Every single one scored 5/5 on Criticality, Accessibility, Recuperability, Effect, and Recognizability. The only dimension that isn't maxed: **Vulnerability — 3/5.**
That 3 is the redactions working. Not enough to hide them, but enough to blunt the documentary evidence from a 5 to a 3. The government didn't protect these people's identities. It protected their *exposure level*. Everyone knows Prince Andrew was connected to Epstein. The redactions exist so you can't prove exactly how.
The 27s — One More Point Missing
| Target | Score | What's Different |
|--------|-------|-----------------|
| Donald Trump | 27 | R2=4 (Recognizability) — 461 doc hits |
| Leon Black | 27 | R2=4 — 1,057 doc hits |
| Jes Staley | 27 | R2=4 — 1,032 doc hits |
| Jamie Dimon | 27 | A=4 (Accessibility) — 272 doc hits |
Trump's recognizability score is 4, not 5. The most recognizable person on the list doesn't max out recognizability. Why? Because recognizability in CARVER isn't fame — it's *identifiability within the documents*. Trump is everywhere in the media, but in the Epstein files specifically, his presence is diffused across references, not concentrated in identifiable records. You know he's connected. The documents make it hard to prove which specific records are about him versus references to him.
Jamie Dimon's accessibility is 4, not 5. He's the CEO of the largest bank in the United States. His accessibility to investigators is limited by exactly the kind of institutional protection that banks provide.
The Clinton-Gates Anomaly — 24/30
Bill Clinton and Bill Gates both score 24. They should be higher. They're in 1,000+ documents each. The scores are held down by lower Criticality and Effect ratings — the scoring model determined that while both are heavily documented, their *structural role in the network* is less central than the 28 Club. They're nodes, not hubs.
This is counterintuitive. Clinton flew on the plane. Gates met with Epstein after the conviction. But the document evidence suggests they were access points, not operators. The difference between a 24 and a 28 is the difference between someone who used the network and someone who *was* the network.
The Quiet Ones
| Target | Score | Doc Hits | Note |
|--------|-------|----------|------|
| Mohammed bin Salman | 24 | 66 | 57 DOJ docs for a sitting head of state |
| Jared Kushner | 20 | 85 | Son-in-law of the 45th president |
| Elon Musk | 18 | 61 | Only 28 DOJ document hits |
| Huma Abedin | 12 | 10 | 7 DOJ documents |
MBS appears in 57 DOJ Epstein documents and still scores 24/30. Kushner is in 77 and scores 20. These are sitting or recent government officials appearing in sex trafficking investigation documents at rates that would trigger any reasonable compliance alert.
The Infrastructure
This entire analysis runs on:
- **6 million documents** across 30+ indexes (ICIJ offshore entities, federal court decisions, IOCs, DOJ Epstein files, DLP findings, CARVER evaluations)
- **One VM** running Meilisearch at 11.4 GB
- **One container** running the analytics dashboard on Azure
- **~$500/month** total infrastructure cost
- **Session-authenticated access** — DLP audit and CARVER scoring only available through the enterprise dashboard behind auth
Every document is government-released or from international journalism consortiums. Zero leaked data. Zero stolen documents. Zero felonies. The government's own narrative, made searchable, scored by military methodology, analyzed by DLP patterns they should have run themselves.
What Comes Next
The 1,334 resolution signals go into a cross-reference engine. When an SSN appears near a redaction in Document A, we search every other document for that SSN with a visible name. When we find it, the redaction resolves.
The government built a wall of black ink. They left 1,334 holes in it. We're not breaking the wall. We're walking through the holes they left open.
*DugganUSA LLC — Government data, made searchable. The filtered narrative indicts the filter.*
*All data sourced from DOJ public releases, ICIJ offshore databases, and federal court records. No classified, leaked, or stolen documents were used in this analysis.*
*CARVER methodology is a publicly documented DoD framework. Our implementation caps confidence at 95% — we guarantee 5% uncertainty exists.*
*Her name was Renee Nicole Good.*
*His name was Alex Jeffery Pretti.*




Comments