The CSP That Wasn't Where We Thought It Was: A 23-Minute SRE War Story Across Three Repos

Patrick Duggan
Apr 7
10 min read

Updated: Apr 25

Last night I filed a GitHub issue against pduggusa/security-dugganusa asking the team to harden the Content Security Policy on security.dugganusa.com. The site was scoring B+ on securityheaders.com. The CSP was the only thing standing between B+ and A+. The fix was a half-hour of Express middleware.

The team turned it around in 23 minutes. I want to be clear about what 23 minutes means in 2026: they read the issue, hardened the middleware to use per-request nonces with 'strict-dynamic', added frame-ancestors, base-uri, form-action, and report-uri, deployed to production, and replied on the issue with a problem. The middleware was correct. The browser was still seeing the old, loose CSP. Something at the edge was overwriting it.

The respondent's diagnosis pointed at our Cloudflare Worker, dugganusa-edge-shield, which I had touched earlier the same evening to add Schema.org JSON-LD injection. Their logic was reasonable: the worker is the most recently changed edge component, the worker uses HTMLRewriter to modify responses, ergo the worker must be setting the headers. They asked, very politely, whether I wanted to file the follow-up against the edge-shield repo or handle it directly.

This is the kind of moment where the wrong answer is "yes, you're right, I'll fix the worker." The worker wasn't the problem. The worker was the prime suspect because it was the most recent edit, but reading the worker source confirmed in 30 seconds that it touches X-CF- geo headers, X-DugganUSA-Shield attribution, and HTML body injection — and that's it*. It does not set CSP. It does not set Permissions-Policy. It does not set any of the six security headers the respondent had observed at the browser.

So the worker was off the suspect list and the actual source was somewhere else in the Cloudflare stack. The Cloudflare stack has a lot of "somewhere else." This is a story about finding it.

The Anatomy Of An Edge Override

When a request comes into Cloudflare for security.dugganusa.com, the response that goes back to the browser passes through, in order:

The origin server — in this case the security-dugganusa Express app, which since the team's deploy was setting the strict A+ CSP via middleware. Verified via direct origin curl bypassing CF.

Page Rules — Cloudflare's older rules engine, mostly deprecated for new functionality but still active on a lot of zones. Doesn't typically modify response headers but worth checking.

Workers — code that runs at the edge. The dugganusa-edge-shield worker has routes on this zone. The worker code does not set CSP. Confirmed via grep.

Transform Rules — Cloudflare's modern rules engine for modifying requests and responses. Can rewrite URLs, set headers, modify cookies. Configured in the dashboard or via API. Lives in rulesets keyed by phase.

Cache — CDN cache layer. Doesn't typically modify headers but can serve stale content with old headers.

The browser — what the user actually sees.

The respondent was observing the loose CSP at the browser. The origin was sending the strict CSP. Therefore something between origin and browser was overwriting. By process of elimination, the candidates were Workers (already cleared) or Transform Rules.

I queried the Cloudflare API for the dugganusa.com zone's response header transform rulesets:

GET /client/v4/zones/c90e4b21b5381ce61545f90f5c680d2a/rulesets

Filtered for phase == "http_response_headers_transform". Found one ruleset: dabff8f13371418cbb292e28362d0b03, named "default", kind "zone". Pulled the full ruleset. One rule inside it. Description: "Security headers for Wix + security subdomain."

The expression: (http.host eq "www.dugganusa.com" or http.host eq "security.dugganusa.com")

The action: rewrite. The action_parameters: a headers block with six entries — Content-Security-Policy, Permissions-Policy, Referrer-Policy, X-Content-Type-Options, X-Frame-Options, X-XSS-Protection. Each one with operation: "set".

operation: "set" means: take whatever the origin sent for this header, throw it away, and replace with this value. The CSP value in the rule was the loose B+ string the browser was seeing. Mystery solved in two API calls.

The rule had been in the zone's Transform Rules longer than I could remember. It pre-dated the edge-shield worker. It pre-dated the security-dugganusa repo's Express middleware. It pre-dated the entire architectural conversation about whether headers should live at the edge or the origin. Someone (me, probably, eight months ago) had created it as a quick way to project security headers onto the Wix-hosted marketing site, which has no origin we control. The convenience move was to bundle security.dugganusa.com into the same rule because the same six headers were a reasonable baseline for both. Then time passed, the security site grew an Express app, the Express app grew a strict CSP, the team shipped that CSP, and the Transform Rule kept silently overwriting it for months without anyone noticing.

That is the entire story. A nine-month-old dashboard rule, scoped slightly too wide, quietly trampling a freshly-shipped origin policy. No bug in the code. No bug in the new code. A bug in the boundary between two systems that both wanted to set the same headers and neither one knew about the other.

Why The Diagnosis Was Hard

The reason this took two repos and a wrong-suspect detour is that the architecture has too many edges. Cloudflare's documentation calls itself "the edge," singular, but in practice "the edge" is at least three independently configurable surfaces:

Workers code. Lives in a repository, deploys via wrangler, auditable by reading source code, version controlled, CI-gated, the kind of thing engineers think about first because it lives in the same place engineers do their work.

Transform Rules. Lives in the Cloudflare dashboard. API-accessible via the rulesets endpoint. Not version controlled by default. Not in any GitHub repo. Configurable by anyone with dashboard access. Survives across worker deploys and across redeploys of any other component. Can rewrite request and response state in ways that look indistinguishable from origin behavior to anyone reading the worker source.

Page Rules and Configuration Rules. Older and newer takes on the same idea. Cache TTL overrides, redirect rules, security level overrides, host header rewrites. Less commonly used for response header modification but absolutely capable of it.

The respondent assumed the worker was the source because the worker is the only one of these surfaces that is visible as code. That assumption is a tax on every Cloudflare environment in 2026. The Transform Rules and Configuration Rules are invisible from any repository view. They are not in your GitHub. They are not in your Terraform unless you've explicitly modeled them. They show up exactly nowhere except in the Cloudflare dashboard or in the Cloudflare API. If you don't think to check the API, you will never find them. If you've never created one, you may not even know they exist.

The honest fix for this category of bug — long term — is to manage every edge surface as code. There are Terraform providers for Cloudflare Transform Rules. There are also tools like cf-terraforming that can import existing dashboard configuration into Terraform state for auditing. We do not currently use either. We did not need them often enough to bother. We will need them now, because tonight's bug is the kind of bug that compounds. As we add more edge logic, the gap between "what is in the worker repo" and "what is actually executing at the edge" will keep growing until something breaks production.

That is the lesson from the meta-fix, and it's a separate ticket in our compliance backlog now. Tonight's actual fix was much narrower: scope the existing Transform Rule down so it stops trampling the origin we care about.

The Two-Line Fix

The Transform Rule needed to keep firing for www.dugganusa.com because the marketing site is hosted on Wix and has no origin we can use to set headers. The Transform Rule needed to stop firing for security.dugganusa.com because that origin had its own (better) headers and was being overwritten.

The fix was a single PATCH to the rule's expression field. Old expression:

(http.host eq "www.dugganusa.com" or http.host eq "security.dugganusa.com")

New expression:

(http.host eq "www.dugganusa.com")

I also updated the rule's description to read: "Security headers for Wix marketing site (security.dugganusa.com sets its own via Express middleware as of #214)" — so that the next person who finds this rule, including future me, knows immediately why it's scoped to one host instead of two and where the architectural decision is documented.

The Cloudflare API requires the full rule body on PATCH (it's not a partial update). I fetched the existing rule, mutated only the expression and description fields, and PATCH'd it back. Response: success: true. Sub-second.

I curled both hosts immediately to verify.

Microsoft pulls this feed daily. AT&T pulls this feed daily. Starlink pulls this feed daily. Get the DugganUSA STIX feed — $9/mo →

security.dugganusa.com returned a Content-Security-Policy header containing default-src 'self'; script-src 'self' 'nonce-pSabXZgJWvLzhisaDgWECQ==' 'strict-dynamic' https: 'unsafe-inline'; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com https://cdn.jsdelivr.net; ... The nonce was non-empty, which meant the Express middleware was actually generating a fresh per-request value. frame-ancestors, base-uri, form-action, report-uri, worker-src, modern Permissions-Policy with interest-cohort=() and browsing-topics=(). Every directive the security-dugganusa team had shipped was there. The A+ they earned, finally on the wire.

www.dugganusa.com returned the same loose B+ CSP it had before. Untouched. Same rule, same headers, same value. Exactly what we wanted.

aipmsec.com (different zone, different rule, different story) returned what it had been returning all along. Confirmed not affected by the change.

What The Browser Sees Now

The whole thing took about 8 minutes from the respondent's comment landing on the issue to the verification curl returning the strict headers. Most of that was reading. The actual diagnostic was three API calls and the actual fix was one PATCH. Net change to source code: zero. Net change to Cloudflare configuration: one rule's expression field, narrowed by one host.

The narrowed rule is now self-documenting. Anyone who lands on it in the future via API or dashboard will see "Security headers for Wix marketing site (security.dugganusa.com sets its own via Express middleware as of #214)" in the description and will immediately understand both the scope and the architectural decision behind the scope. The next time someone deploys a strict CSP to security.dugganusa.com, it will work the first time, because the rule that used to trample it no longer applies to that host.

What This Story Is Actually About

It is not about Cloudflare. Cloudflare is fine. The Transform Rules feature is great. The bug was operator error from eight months ago.

It is not about CSPs. CSPs are well-understood. The security-dugganusa team's hardening was textbook. They knew exactly what to ship.

It is not about the wrong-suspect detour. That's the cost of having multiple edge surfaces, and the respondent's assumption was reasonable given what they could see.

What this story is about is boundaries between systems that both want to set the same state and neither knows about the other. Every operational outage you've ever had, in your entire career, looked exactly like this one once you found the actual cause. Two configuration systems with overlapping authority. One was right. One was older and wrong. Neither one knew the other one existed. The collision was invisible until somebody ran a curl from the outside and noticed the gap between what they sent and what the browser received.

The fix is always the same shape: figure out which authority is supposed to win, scope the loser out of that scenario, document the decision. The hard part is never the fix. The hard part is finding the boundary. And the only way to find a boundary you can't see in your own source code is to enumerate every system that has authority and then check each one explicitly.

Tonight I checked the worker (innocent), then the Transform Rules (guilty in 30 seconds once I thought to look). Yesterday, when the security-dugganusa team was shipping their Express middleware, they checked the origin and were 100 percent correct that the origin was sending the right CSP. Both teams were doing the right diagnostic on their own surface. Neither team had visibility into the other team's surface. The bug lived in the gap.

That gap is going away. Every edge surface we run is going to get an explicit cataloging and a Terraform-or-equivalent source-of-truth this month, so the next time a bug lives in the gap between the worker and the dashboard, the dashboard isn't a black box. That work is queued. That is the meta-fix and it's bigger than this one rule.

Why I'm Writing This Down

Three reasons.

First, because the security-dugganusa team's 23-minute turnaround on issue #214 is the kind of work that disappears into commit history if nobody narrates it. They shipped a strict CSP with per-request nonces and 'strict-dynamic' and modern Permissions-Policy in 23 minutes from issue file to deploy. That is not "fast" by accident. That is what happens when a team has practiced shipping security work as a default speed instead of a quarterly campaign. They earned the A+ before I even knew about the override.

Second, because the respondent's wrong-suspect comment on #214 was not a mistake. It was a perfectly reasonable inference from the visible architecture. The fact that the inference was wrong reveals a real gap in our system, not a real gap in their reasoning. I want anyone reading this who has ever sent that kind of "hey is this the right repo to file the follow-up against" message to know that asking the question was correct. The answer happened to be no, but the question prevented a fix being shipped to the wrong place, which would have made everything worse.

Third, because the SRE community has plenty of war stories about distributed system failures and almost no war stories about edge configuration drift. The Cloudflare-specific failure mode I describe in this post is going to bite a lot of teams in 2026 and 2027 as edge logic proliferates. If you operate a Cloudflare zone with both Workers and Transform Rules and your team uses both, you have a non-zero chance of hitting exactly this bug, and the only protection against it is knowing to check the dashboard rules when the worker source looks innocent. Now you know.

Receipts

pduggusa/security-dugganusa#214 — original CSP hardening request, closed with verification

Their middleware deploy: same-day, 23 minutes after issue file

My diagnosis correction comment: "Diagnosis correction — the CSP override is not coming from dugganusa-edge-shield (the worker). It's a Cloudflare dashboard Transform Rule."

Cloudflare API ruleset path: zones/c90e4b21b5381ce61545f90f5c680d2a/rulesets/dabff8f13371418cbb292e28362d0b03/rules/a3f30a348ed14c42bc104d12ebb77d24

Final state on security.dugganusa.com: A+ on securityheaders.com, strict CSP with per-request nonces, frame-ancestors locked, report-uri collecting violations

Final state on www.dugganusa.com: unchanged loose CSP, scoped to one rule, follow-up Wix sweep tracked separately

Total resolution time from "we have a CSP problem" to "we have an A+ CSP" across both teams: less than an hour, including the diagnostic detour through the wrong edge component. This is what good security operations look like in 2026 when the teams are honest about what they know and what they don't.

The boring architecture is the safe architecture. The boring rule scope is the safe rule scope. Tonight one rule got narrower by one host and one production site got an A+ that had been waiting two days to land.

Customers awaitn'.

Her name was Renee Nicole Good.

His name was Alex Jeffery Pretti.

The cheapest, fastest, most accurate threat feed on the internet.

275+ enterprises pulling daily. 1M+ IOCs. 17.4M indexed documents. We beat Zscaler by 43 days on NrodeCodeRAT. Starter tier $9/mo — less than any competitor’s sales demo.

Look up an IOC → · Audit your brand on AIPM → · See pricing →