12,056 Epstein Court Records From Archive.org — Now Searchable
- Patrick Duggan
- Feb 28
- 6 min read
# 12,056 Epstein Court Records From Archive.org — Now Searchable
**By Patrick Duggan | February 28, 2026**
Today we're releasing a new dataset into the DugganUSA Epstein files index: **12,056 court records** extracted from archive.org, processed, text-extracted, and cross-referenced against the existing 386,000+ DOJ documents already searchable at [epstein.dugganusa.com](https://epstein.dugganusa.com).
This is the first time these records have been collectively searchable. They existed on archive.org — scattered across inconsistent uploads, buried in nested directories, mixed formats, no unified index. We pulled them all, extracted the text, tagged them, and dropped them into the same search infrastructure that already indexes 12 DOJ datasets, 2 million ICIJ offshore entities, and 2.1 million federal court decisions.
The dataset is live now. Filter by `dataset = archive_org` to isolate these records from the rest of the index.
What We Released
| Metric | Value |
|--------|-------|
| Total PDFs processed | 12,056 |
| Documents with extractable text | 11,902 |
| Scanned-only (no extractable text) | 154 |
| Total characters extracted | 188.4 million |
| Processing rate | 4.4 documents/second |
| Total size on disk | 29 GB |
| Audio recordings (WAV) | 40 files, 5.5 GB |
| Source agencies | 4 |
| Court cases identified | 50 |
What's In It
**Four federal and state agencies** are represented in these records:
**FBI** — Investigation files, interview summaries, and correspondence related to the federal Epstein investigation. These aren't the headline-grabbing flight logs (those are already in the main index). These are the procedural documents — the mechanics of how an investigation moves through a bureaucracy.
**Customs and Border Protection (CBP)** — Travel records and border crossing data. In a case where private aviation and international travel are central to the allegations, CBP records fill gaps that flight logs alone cannot.
**Bureau of Prisons** — Records from Epstein's federal incarceration. The circumstances of his death in federal custody on August 10, 2019 remain one of the most scrutinized events in recent DOJ history. These records are part of that paper trail.
**Florida State Records** — State-level prosecution files, including records from the Palm Beach County State Attorney's Office.
The Notable Materials
**Quashed 2005 Florida Case Files.** In 2005, the Palm Beach Police Department investigated Epstein based on a complaint from a 14-year-old victim's mother. The case was referred to the Palm Beach County State Attorney, who convened Grand Jury 05-02. What followed is one of the most controversial decisions in Florida prosecutorial history: the grand jury returned a single charge of solicitation of prostitution — for a 14-year-old — rather than the sexual assault charges the evidence supported. The case files were later sealed. They were subsequently released through legal proceedings and uploaded to archive.org. They are now searchable and cross-referenced against every other document in the index.
**Grand Jury 05-02 Materials.** The grand jury proceedings that resulted in the single-count indictment. These materials document the gap between what investigators found and what prosecutors presented.
**Maxwell Trial Exhibits and Transcripts.** Exhibits and transcript materials from *United States v. Ghislaine Maxwell* (S.D.N.Y., Case No. 20-cr-330). Maxwell was convicted on December 29, 2021 on five of six counts, including sex trafficking of a minor. These trial materials include documentary exhibits entered into evidence and portions of trial testimony.
**40 Audio Recordings — Palm Beach Police Department.** Five and a half gigabytes of WAV audio from the original Palm Beach PD investigation. These are not currently text-searchable (audio transcription is a separate project), but they are preserved and cataloged within the dataset.
How to Access It
**Web search:** [epstein.dugganusa.com](https://epstein.dugganusa.com) — free, no account required for basic search.
**Filter to this dataset:** Use the facet filter `dataset = archive_org` to isolate the 12,056 archive.org records from the rest of the 398,525-document index.
**API access:** Query the search API directly with the dataset filter. Free tier provides 500 queries/day on the Epstein files index.
**Cross-reference:** The power of this release isn't the 12,056 documents in isolation. It's that they now sit alongside 386,000+ other DOJ documents, 2 million ICIJ offshore entities from the Panama and Pandora Papers, and 2.1 million federal court decisions. Search a name in the archive.org court records and see if it appears in flight logs, financial disclosures, or offshore shell company registrations.
Why It Matters
These records were technically public. They sat on archive.org. Anyone could have downloaded them individually. But "technically accessible" is not the same as "searchable," and the difference is not trivial.
Consider: 12,056 PDFs in various formats, uploaded by different users at different times, with inconsistent naming conventions, no standardized metadata, and no search index. A researcher looking for a specific name would need to download thousands of files and search them locally. A journalist on deadline would never attempt it.
Now consider: the same 12,056 documents, text-extracted, tagged with `dataset = archive_org`, and cross-indexed against 398,525 other Epstein-related documents plus 5.3 million records across ICIJ offshore entities and federal court decisions. A researcher types a name and gets results in milliseconds — across every dataset simultaneously.
That's the difference between a document dump and a searchable archive. The DOJ has historically favored the dump. We build the archive.
The Cross-Reference Problem
The quashed 2005 Florida case files are a good example of why cross-referencing matters. In isolation, they document a single failed prosecution in Palm Beach County. Cross-referenced against the federal investigation files already in the index, they reveal the timeline gap — the years between when Florida had sufficient evidence and when federal prosecutors finally acted. Cross-referenced against the ICIJ offshore data, they raise questions about the financial structures that were operational during that gap.
We are not drawing conclusions from these cross-references. We are making them possible. The conclusions belong to journalists, researchers, prosecutors, and the public.
What We Found
We are going to be honest about this section: we built the extraction pipeline and indexed the documents. We have not yet conducted exhaustive analysis of every cross-reference the new dataset surfaces. Here is what we can report from the indexing process itself:
- The 50 court cases span from 2005 (the original Palm Beach investigation) through the Maxwell trial in 2021-2022, providing a 17-year longitudinal view of the legal proceedings.
- 154 of the 12,056 documents (1.3%) were scanned images with no extractable text. These likely contain handwritten notes, forms, or degraded photocopies. They are indexed by filename and metadata but their contents are not text-searchable.
- The FOIA releases from four separate agencies confirm that the Epstein investigation touched federal law enforcement, border security, and the federal prison system — not just the DOJ and FBI that dominate the public narrative.
- The 40 audio recordings represent an evidentiary category that does not exist elsewhere in our index. Audio transcription and indexing is a future project.
We are confident to approximately 95% that the extraction is complete and accurate. The remaining 5% accounts for OCR failures on scanned documents, potential encoding issues in older PDFs, and the inherent uncertainty of any large-scale data processing pipeline. We do not claim perfection. We claim rigor.
Methodology
**Source:** Archive.org collections containing Epstein-related court records, FOIA releases, and trial materials.
**Extraction:** Custom Node.js script (`extract-and-index-archive.js`) running on an Azure VM. The script is idempotent — it tracks processed document IDs in a state file (`archive-extract-state.json`) and skips already-processed documents on re-run. This means the pipeline can be interrupted and resumed without duplication or data loss.
**Text extraction:** PDF text extraction using standard libraries. Documents that yielded zero characters of text were flagged as scanned-only (154 documents). No OCR was applied to the scanned documents in this pass.
**Processing rate:** 4.4 documents per second average across the full 12,056-document corpus. Total extraction time was approximately 46 minutes.
**Indexing:** Documents were indexed into Meilisearch with the filterable attribute `dataset = archive_org`, enabling faceted search that isolates this dataset from the main Epstein files index.
**Storage:** 29 GB on disk at `/mnt/epstein-files/archive-org/` on the indexing VM. Daily backups via rsync to Azure file share.
**Provenance:** Every document in this dataset was uploaded to archive.org from government releases — court filings, FOIA responses, and trial exhibits. This is government data, released through legal channels, rehosted on a public archive, and now indexed by a private company. The chain of custody is: government agency releases document, archive.org hosts it, DugganUSA extracts and indexes the text. At no point was any document obtained through unauthorized access, leak, or hack.
The Numbers
With this release, the Epstein files index at [epstein.dugganusa.com](https://epstein.dugganusa.com) contains **398,525 documents** across 12 DOJ datasets plus the archive.org collection.
The full DugganUSA search platform indexes **11 million+ documents** across **42.8 GB** of searchable data in 37 indexes. The Epstein files are one component of an intelligence platform that includes 943,711 threat indicators feeding 275+ STIX consumers in 46 countries, 2.1 million federal court decisions, and 3.3 million ICIJ relationship edges mapping offshore financial networks.
Monthly infrastructure cost: approximately $500.
*398,525 DOJ documents. 12,056 archive.org court records. 50 court cases. 40 audio recordings. 4 federal agencies. Searchable at [epstein.dugganusa.com](https://epstein.dugganusa.com).*
*The government's own narrative, made searchable, indicts the government.*
**DugganUSA LLC** — protect. publish. amplify.
*Built on government data. Indexed in Minnesota. All sources legal, all provenance public, all documents the government's own words.*
*Her name was Renee Nicole Good.*
*His name was Alex Jeffery Pretti.*




Comments