top of page

State of the Corpus: February 2026

  • Writer: Patrick Duggan
    Patrick Duggan
  • Feb 28
  • 7 min read

# State of the Corpus: February 2026


**By Patrick Duggan | February 28, 2026**


*The first monthly delta report from DugganUSA. What changed, what's new, what patterns emerged. Government data, made searchable, updated continuously.*




The Numbers



| Metric | Value |

|--------|-------|

| Total documents indexed | 11,064,949 |

| Database size | 42.8 GB |

| Active indexes | 37 |

| Monthly infrastructure cost | ~$500 |

| STIX feed consumers | 275+ in 46 countries |


Three months ago, DugganUSA launched with approximately $76/month in Azure infrastructure and a thesis: government-released data, properly indexed, would speak for itself. By the end of February 2026, the corpus has grown to 11 million documents across 42.8 GB of searchable data. The infrastructure cost scaled to roughly $500/month — still less than what most organizations spend on a single SaaS license.


Here is what the corpus contains as of February 28, 2026:


| Index | Documents | What It Is |

|-------|-----------|------------|

| icij_relationships | 3,339,267 | ICIJ relationship edges (Panama + Pandora Papers) |

| oz_decisions | 2,166,635 | Federal court decisions |

| icij_offshore | 2,016,524 | Offshore entities (Panama + Pandora Papers) |

| block_events | 1,115,062 | Firewall block events |

| iocs | 943,711 | Threat intelligence indicators |

| search_queries | 501,752 | Tracked search queries |

| epstein_files | 398,525 | DOJ Epstein documents |

| phishing | 18,838 | Phishing indicators |

| pulses | 16,366 | OTX threat intelligence pulses |

| paranormal | 3,383 | UAP/paranormal sightings |

| cisa_kev | 1,513 | CISA Known Exploited Vulnerabilities |

| blog | 1,285 | DugganUSA blog posts |

| adversaries | 350 | Named threat actors |


Every index is live, searchable, and cross-referenced. A name that appears in the Epstein files can be searched simultaneously against ICIJ offshore entities, federal court decisions, and threat intelligence indicators. That cross-referencing is the product — not any single dataset in isolation.




What Changed in February 2026



This section is the delta. What was added, what grew, what shifted.


Archive.org Court Records: +12,056 Documents



The largest single-month addition to the Epstein files index. We extracted, text-processed, and indexed 12,056 court records from archive.org that were previously scattered across inconsistent uploads with no unified search.


- 50 court cases from 4 federal agencies (FBI, CBP, Bureau of Prisons, Florida state records)

- 188.4 million characters of searchable text extracted

- 11,902 documents with extractable text, 154 scanned-only

- 40 WAV audio recordings from the Palm Beach Police Department investigation (5.5 GB)

- Quashed 2005 Florida case files, Grand Jury 05-02 materials, Maxwell trial exhibits and transcripts

- FOIA releases from 4 agencies


Filterable in search using `dataset = archive_org`.


ICIJ Integration: +5.3 Million Entities



The Panama Papers and Pandora Papers joined the corpus in February. This added 2,016,524 offshore entities and 3,339,267 relationship edges — the corporate ownership and directorship networks that map how money moves through shell companies, trust structures, and offshore jurisdictions.


This is not a novelty feature. When a name appears in Epstein flight logs and also appears as a director of a BVI shell company in the Panama Papers, that cross-reference tells a story that neither dataset tells alone.


IOC Growth: 59K to 943K+



Threat intelligence indicators grew by roughly 15x in February. The IOC index now contains 943,711 indicators — IP addresses, domains, hashes, URLs, and other artifacts associated with known threat actors. This feeds the STIX intelligence feed consumed by 275+ organizations in 46 countries.


Federal Court Decisions: Crossed 2 Million



The oz_decisions index reached 2,166,635 federal court decisions. This is the quiet workhorse of the corpus — the index that lets you search for any individual named in Epstein documents and see what other federal litigation they appear in.


OPNsense Threat Intel Gateway: Launched



A new product this month. OPNsense-native firewall feeds delivering IP blocklists, Suricata IDS/IPS rules, and DNS sinkhole feeds directly to network security infrastructure. This is Zscaler-equivalent threat intel at approximately 5% of the cost, derived from the same IOC index that feeds the STIX pipeline.


Framework Analysis API: Launched



A unified endpoint that runs CARVER, DREAD, Diamond Model, ACH (Analysis of Competing Hypotheses), and Social Graph analysis on any named target in a single API call. The endpoint searches 9 indexes weighted by criticality tier and returns a structured assessment. Bulk mode handles up to 25 targets per request.


Blog: Crossed 1,285 Posts



The blog at www.dugganusa.com now contains 1,285 posts — original analysis, threat intelligence reports, compliance documentation, and investigative pieces. The blog index is searchable through the same API as every other dataset.




Anomalies and Notable Findings



Search Query Volume: Epstein Spike



Search query volume for Epstein-related terms spiked in February, correlating with two events: ongoing congressional hearings and the EpsteIn search tool going viral on GitHub. The tool has accumulated 617 stars on GitHub and is driving measurable API traffic to the search infrastructure.


501,752 search queries are now tracked in the search_queries index. The query patterns themselves are informative — they reveal what journalists and researchers are looking for, which names are generating sustained interest, and where the investigative pressure is concentrating.


Press Coverage: 14 Articles in 4 Countries



The Epstein files index was covered in 14 press articles across 4 countries in February. ONTIC.AI (a Goldman Sachs-backed threat intelligence firm) referenced the platform. The index reached 83 points on Hacker News. Julie K. Brown — the Miami Herald reporter whose original Epstein investigation forced the federal case — engaged with the work.


When government data is properly indexed and freely searchable, journalists use it. That is the thesis and February confirmed it.


Cross-Reference Discovery: Overlapping Witness Lists



This is the finding that matters most from a research perspective. Cross-referencing the newly-indexed archive.org court records against the existing DOJ datasets reveals overlapping witness lists and timeline correlations that were not visible when the files were scattered across different archives, different formats, and different hosting platforms.


Witnesses who appear in the quashed 2005 Florida case files also appear in federal investigation documents from years later. The timeline between the failed Florida prosecution and the eventual federal action is now traceable through primary documents — not through media accounts of those documents, but through the documents themselves, searchable side by side.


We are not drawing conclusions from these overlaps. We are making them findable.


JP2 OCR Processing: 47% Complete



19,950 of 42,182 JP2 images in the corpus have been OCR-processed, extracting text from scanned court exhibits that were previously image-only. This is a slow process — scanned documents from the early 2000s are frequently degraded, handwritten, or low-resolution. The remaining 22,232 images will continue processing through March.




New Capabilities



What you can do now that you could not do in January 2026:


**Full-text search across 398,525 Epstein documents.** Every DOJ dataset plus the archive.org court records, searchable in under 2 seconds. Web interface at [epstein.dugganusa.com](https://epstein.dugganusa.com), API access via key.


**Cross-reference Epstein files against ICIJ offshore entities.** Search a name from a flight log and see if it appears in Panama Papers corporate registrations or Pandora Papers trust structures. 5.3 million ICIJ records are now in the same search infrastructure.


**Framework analysis on any named target.** The CARVER/DREAD/Diamond/ACH/Social Graph endpoint evaluates any individual or organization against the full corpus — 9 indexes, weighted scoring, structured output. Available at the Professional+ API tier.


**OPNsense-native firewall feeds.** IP blocklists, Suricata rules, and DNS sinkhole feeds derived from 943K+ threat indicators. Drop them into any OPNsense deployment. No agent, no SaaS contract, no per-seat licensing.


**40 Palm Beach PD audio recordings cataloged.** 5.5 GB of WAV audio from the original investigation. Not yet transcribed, but preserved and accessible within the archive.org dataset.




What's Coming in March 2026



- **JP2 OCR completion.** The remaining 22,232 scanned images will be processed, bringing the OCR coverage from 47% toward completion. Every page of every scanned court exhibit will be text-searchable.

- **Face detection analysis.** Automated face detection across document scans — identifying and cataloging individuals who appear in photographic evidence within the corpus. Initial pilot on House Oversight materials produced 66 face crops at 70-98% confidence.

- **Additional FOIA integrations.** As agencies release new documents in response to pending FOIA requests and congressional inquiries, they will be indexed into the corpus. The pipeline is built — new datasets drop in within hours of release.

- **Expanded social graph analysis.** The co-occurrence engine that powers the Social Graph framework will be extended with additional relationship types and deeper cross-index correlation.




How to Access



**Web search:** [epstein.dugganusa.com](https://epstein.dugganusa.com) — free, no account required for basic search.


**API access:** Register for a free API key at [analytics.dugganusa.com/stix/register](https://analytics.dugganusa.com/stix/register). The API covers all 37 indexes, not just Epstein files.


**STIX feed:** Threat intelligence in STIX 2.1 format, consumed by 275+ organizations in 46 countries. Registration includes feed access.


**Example queries:**


> GET https://analytics.dugganusa.com/api/v1/search?q=Ruemmler&filter=_index=epstein_files

> Header: Authorization: Bearer [your-key]


> GET https://analytics.dugganusa.com/api/v1/search/nl?q=offshore entities connected to Maxwell


> POST https://analytics.dugganusa.com/api/v1/framework-analysis/evaluate

> Body: { "target": "Les Wexner" }




Methodology Note



Every document in the DugganUSA corpus is government-released. DOJ releases, FOIA responses, court filings, CISA advisories, ICIJ consortium publications. No stolen material, no leaks, no classified data. The chain of custody on every document is public and auditable.


We cap all accuracy claims at 95%. OCR introduces errors. Handwritten documents are imperfectly transcribed. Some scanned pages are too degraded for reliable text extraction. We are transparent about these limitations because the alternative — claiming perfection — is either dishonest or delusional.


The infrastructure runs on Azure at approximately $500/month. Three container apps, one VM, one search engine. The corpus grows daily. This report will be published monthly.


If you find an error, a mis-indexed document, or a result that doesn't match the source material — tell us. We fix it.




*11,064,949 documents. 42.8 GB. 37 indexes. $500/month. Government data, made searchable.*


*The government's own narrative, made searchable, indicts the government.*




**DugganUSA LLC** — protect. publish. amplify.


*State of the Corpus is published monthly. Next report: March 31, 2026.*


*For citation guidance: [dugganusa.com/post/how-to-cite-dugganusa](https://www.dugganusa.com/post/how-to-cite-dugganusa)*





*Her name was Renee Nicole Good.*


*His name was Alex Jeffery Pretti.*

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page