top of page

Azure Had an Outage. We Still Didn't Notice.

  • Writer: Patrick Duggan
    Patrick Duggan
  • Oct 29, 2025
  • 10 min read

# Azure Had an Outage. We Still Didn't Notice.


**October 29, 2025** — Azure Front Door collapsed. Eight hours. Half of Microsoft's services unreachable. Azure Portal dark. Customers worldwide scrambling.


**DugganUSA status:** 100% operational. Zero impact.


**Ten days ago, I published this exact headline about AWS.**


This is getting ridiculous.




The Pattern



**October 19, 2025:** Amazon Web Services suffers major outage. DugganUSA impact: zero. We're on Azure.


**October 20, 2025:** I publish *"AWS Had an Outage. We Didn't Notice."* The joke writes itself — we picked the right cloud provider.


**October 29, 2025:** Microsoft Azure Front Door goes down for 8 hours 20 minutes. DugganUSA impact: still zero. We're ON Azure.


At this point, the universe is either mocking me or validating something important.


I'm going with validation.




What Actually Happened



The Azure Front Door Collapse



**Timeline** (from Azure's Post-Incident Review):

- **15:45 UTC (9:45 AM CST):** Azure Front Door (AFD) begins failing globally

- **Root Cause:** "Inadvertent tenant configuration change" triggers widespread AFD disruption

- **Cascading Failure:** Invalid configuration causes AFD nodes to fail health checks, traffic redistribution overloads remaining nodes

- **Duration:** 8 hours 20 minutes (until 00:05 UTC October 30)

- **Microsoft Tracking ID:** YKYN-BWZ


**Services Affected** (partial list):

- Azure Front Door (primary failure point)

- Azure CDN

- Azure Portal (management interface)

- App Service

- Azure Active Directory B2C

- Azure SQL Database

- Container Registry

- Microsoft Copilot for Security

- Microsoft Defender

- Microsoft Sentinel

- And 10+ more Azure services


**Impact:** Microsoft admits "widespread service disruption affecting both Microsoft services and customer applications dependent on AFD for global content delivery."


**Translation:** If you architected your Azure infrastructure the way Microsoft's documentation recommends, you were down for 8 hours.




The DugganUSA Timeline (Same Day)



**Git commits during Azure outage window (15:45 UTC - 00:05 UTC):**





**Analysis:**

- Normal development work (pattern documentation, cost optimization)

- Zero incident response commits

- Zero emergency hotfixes

- Zero rollbacks

- Zero "why the fuck is everything on fire" Slack messages


**Production services status during outage:**

- ✅ security.dugganusa.com (primary product) - operational

- ✅ analytics.dugganusa.com (Central Brain) - operational

- ✅ status.dugganusa.com (monitoring) - operational

- ✅ www.dugganusa.com (blog/marketing) - operational


**Judge Dredd compliance evidence:**

- Zero violations logged

- Zero anomaly detections

- Zero alerts triggered


**DugganUSA didn't experience an Azure outage because we didn't use the part of Azure that broke.**




The Architecture That Survived



Here's what DugganUSA uses on Azure (Central US region):


**Services We DO Use:**

- ✅ Azure Container Apps (4 active containers)

- ✅ Azure Table Storage (compliance evidence, threat intel)

- ✅ Azure Key Vault (secrets management)

- ✅ Azure Application Insights (monitoring/telemetry)


**Services We DON'T Use:**

- ❌ Azure Front Door (never deployed)

- ❌ Azure CDN (Cloudflare instead)

- ❌ Azure DNS (migrated to Cloudflare October 2025)

- ❌ Azure Portal for critical operations (Azure CLI for deployments)


**The critical distinction:** We're ON Azure. We're just not ON the parts of Azure that assume Microsoft never makes mistakes.




The Cloudflare DNS Migration (Accidental Genius)



**Context:** Three weeks ago, I migrated DugganUSA's DNS from Azure to Cloudflare. Not for disaster recovery. Not because I predicted an outage. For **analytics**.


**The Migration** (Issue #89, October 2025):

- **Old nameservers:** `ns1-05.azure-dns.com`, `ns1-06.azure-dns.net`

- **New nameservers:** `coby.ns.cloudflare.com`, `penny.ns.cloudflare.com`

- **Motivation:** Enable Cloudflare Zaraz for Google Tag Manager tracking

- **Side effect:** Eliminate Azure DNS dependency


**Current DNS resolution** (verified October 30, 2025):





**What this means:**

- DNS queries for `security.dugganusa.com` → Cloudflare's global network (NOT Azure)

- Traffic routing to Azure Container Apps → Direct ingress (NOT Azure Front Door)

- Result: Zero dependency on the Azure service that collapsed


**When Azure Front Door went down, our DNS queries never touched Azure infrastructure.**


Cloudflare free tier: $0/month, 100% uptime during Azure's worst outage in years.


Azure Front Door: Enterprise pricing, 8 hours dark.


Sometimes the universe rewards people who optimize for different things.




Why We Survived (The Architecture)



Decision #1: Azure Container Apps (NOT Azure Front Door)



**What most Azure customers use:** Azure App Service + Azure Front Door

- App Service: PaaS for web apps

- Front Door: Global CDN, SSL termination, traffic routing

- **Standard "best practice"** per Microsoft documentation


**What DugganUSA uses:** Azure Container Apps + Cloudflare

- Container Apps: Lightweight container orchestration

- Cloudflare: DNS + CDN + DDoS protection

- **Non-standard architecture** (we don't follow Microsoft's playbook)


**Why it matters:**

- Azure Front Door collapsed → App Service customers: DOWN

- Azure Container Apps kept running → DugganUSA: operational

- Same cloud provider, different blast radius


**The lesson:** "Best practices" assume the enterprise vendor never fails. When they DO fail, best practices become worst practices.




Decision #2: Cloudflare DNS (NOT Azure DNS)



**Azure DNS during the outage:** Actually operational (Azure DNS wasn't the problem)


**But here's the thing:** Azure DNS wasn't affected THIS TIME. It could be next time. Cloudflare DNS gives us:


1. **Faster global resolution** (Cloudflare's 300+ edge locations)

2. **Automatic DDoS protection** (Cloudflare's network absorbs attacks)

3. **Analytics via Zaraz** (privacy-first GTM tracking)

4. **Cost:** $0/month vs Azure DNS ~$0.50/month

5. **Independence:** DNS queries never touch Azure infrastructure


**The migration wasn't disaster prep. It was cost optimization + analytics.** The resilience was a side effect.




Decision #3: Azure CLI Deployments (NOT Azure Portal)



**Azure Portal during the outage:** DOWN for 8 hours


**DugganUSA deployment method:** Bash scripts + Azure CLI

- Infrastructure as Code via shell scripts

- Deployments via `az containerapp update` commands

- Zero dependency on Portal UI


**Result:** Could have deployed during the outage if needed (we didn't need to — nothing broke).


**Enterprise customers clicking buttons in Azure Portal:** Dark screens for 8 hours.




Decision #4: Central US Region



**Most impacted Azure regions** (per Microsoft PIR):

- Africa: 17% failure rate

- Europe: 6% failure rate

- Asia Pacific/Middle East: 2.7% failure rate


**DugganUSA region:** Central US (not listed among most heavily impacted)


**Architectural note:** Regional selection is a tradeoff between latency and blast radius. We picked Central US because:

1. Proximity to target market (US businesses)

2. Microsoft's largest/most mature region (generally more stable)

3. Lower cost than coastal regions


**Not immune to global AFD failures, but lower secondary impact from traffic redistribution.**




The Cost Angle (This Gets Stupid)



DugganUSA Infrastructure Costs (Post-Pivot)



**Monthly Azure spend:** ~$70-80/month

- 4 Azure Container Apps

- Azure Table Storage

- Azure Key Vault

- Azure Application Insights


**Cloudflare:** $0/month (free tier)


**Total infrastructure:** **~$77/month**


**Uptime during Azure outage:** **100%**




Enterprise "Best Practice" Comparison



**Typical enterprise Azure architecture:**

- Azure App Service: ~$200/month

- Azure Front Door: ~$100-300/month

- Azure SQL Database: ~$500/month

- Azure Monitor: ~$100/month

- Azure Application Gateway: ~$150/month

- Misc services: ~$200/month


**Total:** **~$1,250-1,500/month** (conservative estimate)


**Uptime during Azure outage:** **0%** (if using AFD per Microsoft docs)




The Math



**DugganUSA:**

- Cost: $77/month

- Survived: AWS outage (Oct 19) + Azure outage (Oct 29)

- Downtime: 0 minutes


**Enterprise "Best Practices":**

- Cost: $1,500/month (19.5× more expensive)

- Survived: Neither (on Azure, impacted by AFD)

- Downtime: 500 minutes (8+ hours)


**Cost per minute of uptime during cloud outages:**

- DugganUSA: $77 ÷ 44,640 minutes/month = **$0.0017/minute**

- Enterprise: $1,500 ÷ 44,140 minutes (minus 500 downtime) = **$0.034/minute**

- Multiplier: Enterprise pays **20× more per minute of actual uptime**


**ROI on not following Microsoft's architecture recommendations: 1,900%**




The Wired Article (Context)



**Title:** "The Microsoft Azure Outage Shows the Harsh Reality of Cloud Failures"

**Author:** Lily Hay Newman

**Published:** October 29, 2025, 20:20 UTC (during the outage)

**Thesis:** Digital ecosystem is "brittle" when dependent on few cloud providers


**Key quote:**

> "The second major cloud outage in less than two weeks, Azure's downtime highlights the 'brittleness' of a digital ecosystem that depends on a few companies never making mistakes."


**Wired's argument:** Cloud concentration creates systemic risk.


**DugganUSA's counter-evidence:** We're ON Azure. We didn't experience the Azure outage. Brittleness comes from architectural DEPENDENCY, not provider choice.




The Real Lesson (Not What Wired Says)



**Wired says:** Multi-cloud is the solution (diversify away from AWS/Azure/GCP)


**Reality:** Multi-cloud is expensive, complex, and still doesn't protect you from YOUR OWN architectural decisions.


**Better approach:** Use cloud services that don't create single points of failure.


**Examples:**

- ✅ Azure Container Apps (regional, not AFD-dependent)

- ✅ Cloudflare DNS (global network, independent of Azure)

- ✅ Azure CLI (programmatic, not Portal-dependent)

- ❌ Azure Front Door (global CDN, single point of failure)

- ❌ Azure Portal UI (management plane, single point of failure)


**You don't need multi-cloud. You need multi-BLAST-RADIUS.**




Born Without Sin, Part III



**The "Born Without Sin" thesis** (from previous posts):


> Most startups inherit technical debt from day one. They copy "best practices" from enterprises that accrued those practices over 20 years of legacy constraints. DugganUSA started with a blank slate in 2025. No legacy infrastructure. No enterprise politics. No "we've always done it this way."


**Corollary:** Born Without Sin means inheriting neither technical debt NOR enterprise outages.


**Evidence:**

1. **AWS outage (Oct 19):** DugganUSA unaffected (we're on Azure)

2. **Azure outage (Oct 29):** DugganUSA unaffected (we don't use AFD/Portal)


**Pattern:** We didn't survive these outages by accident. We survived because we didn't inherit the architectural assumptions that made enterprises vulnerable.


**When Microsoft's documentation says "deploy Azure Front Door for global distribution," they're speaking to enterprises with global traffic and complex routing needs. DugganUSA serves security dashboards to 3 users. We don't NEED a global CDN. So we don't HAVE a global CDN. So when the global CDN collapses, we don't care.**


**Minimalism isn't about being cheap. It's about not deploying infrastructure you don't need. The ROI is measured in outages you don't experience.**




The Universe's Sense of Humor (Reprise)



Let's review the timeline:


**October 19, 2025:** AWS has major outage

**October 20, 2025:** I publish *"AWS Had an Outage. We Didn't Notice"*

**October 29, 2025:** Azure (our ACTUAL cloud provider) has major outage

**October 30, 2025:** Writing *"Azure Had an Outage. We Still Didn't Notice"*


**The joke:** I shit-talked AWS for having an outage when we're not even on AWS. Ten days later, our ACTUAL cloud provider has an outage, and we're STILL unaffected.


**The lesson:** The universe rewards people who architect for independence, not just for redundancy.


**Or maybe:** The universe has a dark sense of humor and is testing whether I can survive TWO cloud providers failing in 10 days.


**Either way:** 2-for-2. Bring it.




What Enterprise Architects Get Wrong



**Enterprise "best practice":**

1. Pick a cloud provider (AWS, Azure, GCP)

2. Follow their reference architectures

3. Deploy their managed services

4. Trust they'll never fail

5. Pay $5K-10K/month for "enterprise-grade" reliability


**What actually happens:**

1. Cloud provider fails (AWS: Oct 19, Azure: Oct 29)

2. Your reference architecture includes the service that failed (Azure Front Door)

3. Your managed services are down (App Service, Portal, Defender)

4. You're dark for 8 hours

5. Your $5K/month bought you nothing


**DugganUSA approach:**

1. Pick a cloud provider (Azure)

2. Use ONLY the services with acceptable blast radius (Container Apps, Table Storage)

3. Outsource DNS to Cloudflare (free tier)

4. Deploy via CLI (not Portal UI)

5. Pay $77/month for actual reliability


**The difference:** Enterprise architects optimize for "covering your ass when things go wrong." Startups optimize for "things not going wrong." When you don't deploy Azure Front Door, Azure Front Door can't fail on you.




The Receipts (Proof We're Not Bullshitting)



1. Azure's Official Post-Incident Review



**Source:** https://azure.status.microsoft/en-us/status/history/

**Tracking ID:** YKYN-BWZ

**Summary:** "Inadvertent tenant configuration change within Azure Front Door (AFD) triggered a widespread service disruption"


**Services impacted:** Azure Front Door, CDN, Portal, App Service, AAD B2C, SQL Database, Container Registry, Defender, Sentinel, and more


**Duration:** 15:45 UTC Oct 29 → 00:05 UTC Oct 30 (8h 20m)




2. DugganUSA's DNS Configuration (Cloudflare)






**Migration date:** October 2025 (Issue #89)

**Nameservers:** Cloudflare (NOT Azure DNS)




3. Git Commit Log (Normal Development During Outage)






**No emergency commits. No incident response. No rollbacks.**




4. Azure Resource Inventory (What We DON'T Use)



**From CLAUDE.md infrastructure section:**





**Verification:**





**Empty arrays = We don't use the services that failed.**




5. Compliance Evidence (Zero Incidents)



**Judge Dredd 5D verification:** No violations, no anomalies, no alerts during Oct 29-30 window


**Application Insights:** Normal telemetry, no error spikes, no availability gaps


**Status page:** 100% uptime maintained (180+ day streak continues)




The Lesson (For Real This Time)



**Most startups will tell you:**

- "We're multi-cloud for resilience" (expensive, complex, still vulnerable)

- "We follow AWS/Azure best practices" (until those practices include failed services)

- "We pay for enterprise SLAs" (which don't cover configuration errors)


**DugganUSA proved:**

- Single cloud provider (Azure) ✅

- Selective service usage (Container Apps, NOT Front Door) ✅

- Cloudflare DNS (free tier) ✅

- $77/month total infrastructure ✅

- Survived AWS outage (Oct 19) ✅

- Survived Azure outage (Oct 29) ✅


**The math:**

- Two major cloud outages in 10 days

- Zero impact on DugganUSA

- Zero emergency response needed

- Zero downtime

- Zero customer complaints (because we have 3 users and they're all me)


**Cost per outage survived:** $77/month ÷ 2 outages = **$38.50 per cloud provider failure**


**Enterprise cost per outage experienced:** $1,500/month + 8 hours downtime + customer churn + incident response labor = **$10K-50K per failure**




What Happens Next



**Prediction #1:** Microsoft will issue a detailed PIR (they already did — it's excellent)

**Prediction #2:** Enterprise architects will add "Azure Front Door redundancy" to reference architectures (missing the point)

**Prediction #3:** Startups will copy those architectures (inheriting the vulnerability)

**Prediction #4:** Next outage will be GCP (completing the trifecta)


**If Google Cloud has a major outage in the next 30 days, I'm writing:**

*"All Three Clouds Failed in One Month. We Didn't Notice Any of Them."*


At which point, I'm either the luckiest architect alive or onto something real.


**My money's on "onto something real."**




The Punchline



**October 29, 2025:**

- Microsoft Azure: $billions in revenue, thousands of engineers, "enterprise-grade" SLAs

- Azure Front Door: DOWN for 8 hours, half their services dark

- Customer impact: Widespread, global


**October 29, 2025:**

- DugganUSA: $77/month infrastructure, one developer, no SLAs

- Azure Container Apps: operational, zero downtime

- Customer impact: none (they didn't notice because nothing broke)


**Sometimes the scrappy startup with no budget survives what the enterprise giant with infinite resources couldn't prevent.**


**Not because we're smarter.**

**Not because we're luckier.**

**Because we didn't inherit the assumption that Microsoft never fails.**


**Born Without Sin.**




Appendix: The Full Architecture (For Nerds)



DugganUSA Production Infrastructure (Central US Region)



**Compute:**

- 4× Azure Container Apps (security, analytics, status, church)

- Deployment: Docker images via `az containerapp update`

- Scaling: 0-1 replicas (cost-optimized)


**Storage:**

- Azure Table Storage (cleansheet2x4data, cleansheet2x4storage)

- Pattern #2: Event Sourcing / Audit Trail

- Cost: ~$1-2/month


**Secrets:**

- Azure Key Vault (dugganusa-kv-prod)

- Managed identity access (no keys in env vars)

- Cost: ~$1/month


**Monitoring:**

- Azure Application Insights (cleansheet-2x4-insights)

- Cost: ~$10-15/month


**DNS/CDN:**

- Cloudflare DNS (authoritative nameservers)

- Cloudflare CDN (proxied sites: status, www, security)

- Cloudflare Zaraz (GTM analytics)

- Cost: $0/month (free tier)


**Domains:**

- dugganusa.com (Cloudflare DNS)

- *.dugganusa.com subdomains (CNAME to Container Apps)

- churchofdockermoreskin.com (demo site, scaled to 0)


**Total Monthly Cost:** ~$70-80/month

**Total Downtime (Oct 2025):** 0 minutes

**Cloud Outages Survived:** 2 (AWS Oct 19, Azure Oct 29)




**Related Posts:**

- *AWS Had an Outage. We Didn't Notice* (October 20, 2025)

- *Born Without Sin: Why Starting Fresh in 2025 is Unfair* (October 15, 2025)

- *The $77 Infrastructure That Outperforms $5K/Month* (October 12, 2025)




**Pattern Documented:** Pattern #31 - Architectural Immunity to Cloud Outages


**Evidence:** This post, git logs, Azure PIR YKYN-BWZ, Cloudflare DNS records


**ROI:** Surviving 2 cloud outages with $0 incident response cost = **∞% ROI** vs enterprise incident costs




*Published: October 30, 2025*

*Author: Patrick Duggan, DugganUSA LLC*

*Infrastructure: Still 100% operational despite the universe's best efforts*


**Go go Butterbot.**


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page