top of page

Knowledge Told Us to Embed Everything. Wisdom Was Measuring That It Collapsed. A Cure for Dunning-Kruger.

  • Writer: Patrick Duggan
    Patrick Duggan
  • 5 minutes ago
  • 4 min read

Knowledge is the cheapest thing in security. Everybody has the same blog posts, the same CVE feeds, the same vendor decks. Knowledge is what you can look up. Wisdom is knowing the edges of what you looked up — and the gap between those two is exactly where Dunning and Kruger built their famous little hill. This is a story about a day we walked up that hill, confident, and got measured back down it by our own system. That measurement is the closest thing to a cure for Dunning-Kruger we have found, and it is buildable.


Here is the knowledge. Vector embeddings make search better. This is true, well-documented, and the kind of thing you can say in a meeting and watch everyone nod. Semantic search finds the document that means what you asked even when it does not use your words. So the obvious move, the confident move, is to embed everything — every index, every record, turn the whole corpus into vectors and let semantic search work its magic. We knew this. We had read the papers. We could have written the meeting deck.


Then we measured.


We pulled the raw embedding vectors off our own threat-intelligence indexes and computed how the space was actually shaped — not how we assumed it was shaped, how it measurably was. The number that matters is the average cosine similarity between random pairs of documents. In a healthy semantic space, unrelated documents point in different directions; that number is low, and the field has structure. We found our IOC and threat-pulse indexes sitting at 0.77 to 0.84. That is not a search index. That is a collapsed star. Seventy-nine percent of the mass had fallen into a single cluster, and when we read the cluster, it was not a threat at all — it was our own pipeline's boilerplate. "Novel IOC detected by PreCog Sweep. First seen at" — the same sentence, a million times, with the timestamp as the only thing the embedder could find to distinguish one indicator from the next.


We had been about to spend roughly thirty hours of compute re-embedding 1.18 million records to make that better. The measurement is what stopped us. The data was telling us something our knowledge could not: you cannot embed your way out of boilerplate. If the text is machine-generated and identical, there is no meaning for a meaning-machine to find. Vectorizing it does not add intelligence; it just spends electricity arranging noise.


That sentence — you cannot embed your way out of boilerplate — is wisdom, not knowledge. You cannot look it up. It only exists at the boundary where the confident, correct-sounding general rule ("embeddings improve search") meets the specific, stubborn texture of your actual data. Dunning-Kruger is the disease of never reaching that boundary, because confidence feels exactly the same whether or not you have measured. The hill is built from the part of the map you cannot see, and the cruelty of it is that from the top of the hill, the map looks complete.


So what is the cure? Not humility as a personality trait. Humility you can perform on a podcast and abandon by lunch. The cure is structural: a system that returns the full universe of evidence and lets it contradict you, out loud, before you have committed.


Ours did it twice in one day. When we asked our own search tool — the one our customers and AI agents actually use — to find "Citrix NetScaler memory overread," it returned Citrix phishing URLs. Not the critical NetScaler vulnerability we had published about that morning. Phishing pages, because the search was matching the word "Citrix" like a sorting machine, not understanding the concept like a colleague. We thought the feature worked. The measurement said it did not. There is no amount of confidence that survives watching your own tool hand a security team garbage in front of you. That humiliation is the medicine. It is the thing that drags you off the hill.


And the fix, once measured, was the opposite of the confident move. Not embed everything — embed almost nothing. We vectorized two small, high-signal indexes that had no embeddings at all: the catalog of known-exploited vulnerabilities, and the profiles of named threat actors. About two thousand records. Three minutes. And suddenly a blog post about a NetScaler memory bug could pull the exact NetScaler vulnerability and the actors who exploit that class of flaw out of entirely separate indexes, at ninety-four percent semantic confidence. The big, noisy, confident project would have produced a collapsed star. The small, measured, humble one produced the thing we actually wanted.


This is the whole discipline, and it is teachable. Query the detector for the full universe of indicators, not the three your conversation happened to mention. Measure the shape of the data before you write the narrative, because hypothesis-led research finds whatever it went looking for, while shape-led research can come back and say "no signal" — which is the most valuable sentence a measurement can produce. Name your blind spots in public. And cap your certainty at ninety-five percent on principle, because the missing five percent is precisely the part of the map that builds the hill, and the only honest thing to do with a part of the map you cannot see is to admit it is there.


A system that does this for you is not a search engine. It is an epistemics engine. It turns "I am sure" into "let me check," automatically, every time, before the confidence can calcify into a deployed mistake. We did not set out to cure Dunning-Kruger. We set out to make search better and got measured into wisdom against our will. That is the best kind. The cure for thinking you know more than you do is not thinking less of yourself. It is building something that keeps quietly, relentlessly showing you the rest of the mountain.




The threat feed this post is built on

1.14M+ IOCs, STIX 2.1, precursor signals, supply-chain detection. Free API key in 30 seconds.


bottom of page