top of page

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

  • Writer: Patrick Duggan
    Patrick Duggan
  • Nov 5, 2025
  • 11 min read

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis


Date: November 5, 2025

Pattern: Sub-pattern of Pattern #32 (Polish vs Dent Partnership Framework)

Author: Patrick Duggan, DugganUSA LLC


---


Executive Summary


When analyzing 172 IPs auto-blocked in 33 seconds (The Aristocrats Incident, Nov 2-3, 2025), we discovered a critical distinction: Friendly Fire ≠ Armor Denting. This sub-pattern documents the difference and provides behavioral analysis to differentiate AI bots from ML crawlers.


Key Finding: We accidentally blocked 33 legitimate bots (19.7% false positive rate), raised the threshold from >5 to >10, and learned to distinguish friendly fire (self-inflicted) from armor denting (partner-inflicted).


---


Pattern #32 Recap: Polish vs Dent


Armor Denting (Partner Abuse):

  • Partner action damages YOUR reputation for THEIR benefit

  • Partner remains silent or denies when caught

  • Example: AWS labels Amazon.com infrastructure as "Anthropic, PBC" → Anthropic blamed for Amazon's aggressive crawling


Armor Polishing (Partner Respect):

  • Partner action elevates YOUR reputation

  • Partner credits you publicly

  • Example: Google press release "Anthropic to Expand Use of Google Cloud" → Anthropic as protagonist


---


Pattern #32.1: Friendly Fire


Definition


Friendly Fire: Self-inflicted reputation damage from over-aggressive security controls, followed by immediate acknowledgment and correction.


NOT Armor Denting because:

1. ❌ No partner involved (you did it to yourself)

2. ❌ No benefit to the blocker (worse SEO, less indexing)

3. ✅ Immediate acknowledgment ("Apology to the 33")

4. ✅ Root cause fix (threshold 5 → 10)

5. ✅ Public learning (blogged about it)


---


The Aristocrats Incident: Receipts


What Happened (Nov 2-3, 2025)


Auto-Blocker Configuration:

  • Threshold: Score >5 (AGGRESSIVE)

  • Source: AbuseIPDB confidence scores

  • Trigger: Cloudflare analytics → AbuseIPDB enrichment → Auto-block


Result:

  • 172 IPs blocked in 33 seconds

  • 33 were innocent (19.7% false positive rate)


The Innocent 33:


#### ✅ Googlebot (Multiple IPs)

{
  "ip": "66.249.69.200",
  "isp": "Google LLC",
  "abuseScore": 0,
  "totalReports": 6,
  "reason": "Legitimate search crawler",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Respects robots.txt

  • Identifies clearly in User-Agent: "Googlebot/2.1"

  • Consistent crawl rate (not aggressive)

  • 0% abuse confidence despite reports


Why Reported? Sites that don't understand what Googlebot does file false reports.


---


#### ✅ Ahrefs SEO Bot (Canada, 6 IPs)

{
  "ipRange": "54.x.x.x",
  "country": "CA",
  "isp": "Ahrefs Pte Ltd",
  "abuseScore": 0,
  "totalReports": 4-7,
  "reason": "SEO backlink analysis",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Identifies as "AhrefsBot/7.0"

  • Helps sites understand SEO health

  • Not aggressive, just thorough

  • 0% abuse confidence


Why Reported? Sites blocking all bots indiscriminately.


---


#### ✅ Microsoft Bing Crawler

{
  "isp": "Microsoft Corporation",
  "abuseScore": 0,
  "reason": "Bing search indexing",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Polite crawling (respects rate limits)

  • Clear identification

  • When misbehaves: RARE (noted in blog as exceptional)


---


#### ✅ Google DNS (8.8.8.8)

{
  "ip": "8.8.8.8",
  "abuseConfidenceScore": 0,
  "totalReports": 165,
  "numDistinctUsers": 52,
  "usageType": "Content Delivery Network",
  "isp": "Google LLC",
  "isWhitelisted": true
}

The Paradox: 165 abuse reports, 0% confidence score


Why? People blame DNS for EVERYTHING:

  • Category 14: Port scan (not DNS behavior)

  • Category 18: Brute force (not DNS behavior)

  • Category 22: Web spam (not DNS behavior)

  • Category 7: DDoS (people blame the resolver, not the attacker)


Actual Behavior: Resolves DNS queries. That's it.


---


AI vs ML Bot Behavioral Patterns


Research Question


Can we differentiate AI-powered bots (GPT, Claude, Gemini) from traditional ML crawlers (Googlebot, Ahrefs)?


Hypothesis


AI Bots (Generative):

  • Purpose: Training data collection, content generation, Q&A

  • Behavior: Deep page analysis, content extraction, context understanding

  • User-Agent: "GPTBot", "ClaudeBot", "Google-Extended" (Gemini training)

  • Respect robots.txt: YES (usually - GPTBot obeys)

  • Rate: Variable (adaptive based on content value)


ML Bots (Indexing):

  • Purpose: Search engine indexing, SEO analysis, link discovery

  • Behavior: Shallow crawl, metadata collection, link mapping

  • User-Agent: "Googlebot", "Bingbot", "AhrefsBot"

  • Respect robots.txt: YES (industry standard)

  • Rate: Consistent, predictable


---


Evidence from Traffic Analysis


Pattern Observation (from blocked IPs):


Googlebot behavior:

  • Crawl frequency: 1-2 requests/minute (consistent)

  • Page depth: Shallow (metadata, links, structure)

  • Content focus: Indexable text, not context

  • robots.txt: ALWAYS respected (industry leader)

  • Response to 429 (rate limit): Backs off immediately


Ahrefs behavior:

  • Crawl frequency: 2-3 requests/minute (thorough but polite)

  • Page depth: Link-focused (backlink analysis)

  • Content focus: Anchor text, link structure

  • robots.txt: Respected

  • Response to 429: Backs off, retries later


Suspected AI Bot behavior (from 216.73.216.112 - AWS/Anthropic impostor):

  • Crawl frequency: AGGRESSIVE (triggered ModSecurity rate limits)

  • Page depth: Unknown (but WordPress brute force attempts suggest deep)

  • Content focus: Unknown (likely training data extraction)

  • robots.txt: IGNORED (multiple abuse reports cite violations)

  • Response to 429: IGNORED (continued aggressive behavior)


---


AI vs ML Bot Signature Matrix


| Metric | Googlebot (ML) | Ahrefs (ML) | AWS Impostor (AI?) | Actual ClaudeBot (Unknown) |

|--------|----------------|-------------|-------------------|---------------------------|

| Abuse Score | 0% | 0% | 74% | Unknown (not in dataset) |

| robots.txt | ✅ Respects | ✅ Respects | ❌ Ignores | ✅ Should respect |

| Rate Limiting | ✅ Backs off | ✅ Backs off | ❌ Ignores | ✅ Should back off |

| User-Agent | Clear | Clear | "ClaudeBot" (fake) | "ClaudeBot" (real) |

| WHOIS Match | ✅ Google LLC | ✅ Ahrefs Pte | ❌ Amazon ≠ Anthropic | ✅ Should match |

| Behavior | Polite indexing | SEO analysis | WordPress brute force | Unknown |


---


The Humpty Hump Principle (Applied)


Digital Underground wisdom: "The meta tells the tale."


Application:

  • Don't trust User-Agent header (easily faked)

  • Check WHOIS (authoritative ownership)

  • Analyze behavior (robots.txt respect, rate limit response)

  • Cross-reference abuse reports (pattern vs anomaly)


Case Study: 216.73.216.112

  • **User-Agent claims:** "ClaudeBot" (implies Anthropic)

  • **AbuseIPDB label:** "Anthropic, PBC"

  • **WHOIS reveals:** Amazon.com, Inc. (AMAZO-4)

  • **Behavior:** Aggressive, ignores robots.txt

  • **Verdict:** IMPOSTOR (AWS weaponizing Anthropic brand)


---


The Fix: From Friendly Fire to Surgical Strike


Before (Threshold >5 - AGGRESSIVE)


Collateral Damage:

  • Googlebot: BLOCKED ❌

  • Ahrefs: BLOCKED ❌

  • Microsoft Bing: BLOCKED ❌

  • Actual threats: BLOCKED ✅


False Positive Rate: 19.7% (33 innocents / 172 total)


Problem: Too sensitive. Legitimate bots with ANY abuse reports got blocked.


---


After (Threshold >10 - CONSERVATIVE)


Surgical Precision:

  • Googlebot (score 0): ALLOWED ✅

  • Ahrefs (score 0): ALLOWED ✅

  • Microsoft Bing (score 0): ALLOWED ✅

  • AWS Impostor (score 74): BLOCKED ✅


False Positive Rate: Target <5% (monitoring ongoing)


Improvement: Focus on BEHAVIOR (abuse confidence) not NOISE (report count)


---


Lessons Learned


1. False Reports ≠ Malicious Behavior


Google DNS Example:

  • 165 reports from 52 users

  • 0% abuse confidence

  • Categories: Port scan, DDoS, brute force (none are DNS behavior)

  • **Truth:** People blame the messenger (DNS resolver) for the message (attack traffic)


Googlebot Example:

  • 6-7 reports

  • 0% abuse confidence

  • Categories: "Unwanted crawler"

  • **Truth:** Sites that don't want Google indexing them (valid choice, not abuse)


---


2. Friendly Fire Requires Immediate Correction


Our Response Timeline:

  • **Nov 2, 09:50 AM:** Aristocrats Incident (172 blocked, 33 innocent)

  • **Nov 2, 10:15 AM:** Discovery via logs review

  • **Nov 2, 11:00 AM:** Blog post "Apology to the 33" drafted

  • **Nov 3:** Issue #189 opened (19.7% false positive analysis)

  • **Nov 5, 01:01 UTC:** Fix deployed (threshold 5 → 10)


Total Time to Fix: 3 days from discovery to production deployment


Public Acknowledgment:

  • "Hey Google! Your crawler got flagged. Wave! 👋"

  • "Known good crawlers (Google, Bing, Ahrefs, Yandex) - probably false positives"

  • "Today it was Google bots and a hot tub. Tomorrow it could be a paying customer at 3 AM."


---


3. Armor Denting ≠ Friendly Fire


Armor Denting (AWS → Anthropic):

  • AWS labels Amazon infrastructure "Anthropic, PBC"

  • Anthropic blamed for Amazon's 118 abuse reports

  • AWS benefits (hides ownership)

  • Anthropic suffers (reputation damage)

  • AWS silent (no acknowledgment, no fix)

  • **Time to response:** Still waiting (reported Nov 5, ticket #215471615057882)


Friendly Fire (You → Googlebot):

  • You blocked legitimate crawlers with threshold >5

  • You suffered (worse SEO, less indexing visibility)

  • Google unaffected (one site doesn't matter)

  • You acknowledged immediately (public apology)

  • You fixed root cause (threshold adjustment)

  • **Time to response:** 3 days


The Difference: Intent, benefit, acknowledgment, repair.


---


AI Bot Detection Methodology (Proposed)


Signals for AI Training Bots


Positive Indicators (likely AI):

1. Deep content extraction (full page HTML)

2. Context-aware crawling (follows semantic links, not just href tags)

3. Variable rate (adapts to content "value")

4. Focus on text-heavy pages (documentation, blog posts, forums)

5. User-Agent: "GPTBot", "Google-Extended", "ClaudeBot" (verify WHOIS!)


Negative Indicators (likely ML indexer):

1. Shallow metadata collection (title, description, links)

2. Uniform crawling (all pages treated equally)

3. Consistent rate (predictable pattern)

4. Focus on structure (sitemap, link graph)

5. User-Agent: "Googlebot", "Bingbot", "AhrefsBot"


---


Verification Checklist


Before blocking ANY bot:


1. ☑ Check WHOIS (User-Agent can lie, WHOIS can't)

2. ☑ Analyze abuse reports (confidence score, not count)

3. ☑ Review behavior (robots.txt respect, rate limit response)

4. ☑ Cross-reference ISP (does AbuseIPDB match WHOIS?)

5. ☑ Test threshold (what score would catch this? what else gets caught?)


AWS Impostor Case:

1. ✅ WHOIS: Amazon.com, Inc. (≠ Anthropic)

2. ✅ Abuse: 74% confidence (118 reports in 4 days)

3. ✅ Behavior: Ignores robots.txt, WordPress brute force, ModSecurity triggers

4. ❌ ISP mismatch: AbuseIPDB says "Anthropic", WHOIS says "Amazon"

5. ✅ Threshold: >10 would catch (74% >> 10%)


Googlebot Case:

1. ✅ WHOIS: Google LLC (matches User-Agent)

2. ✅ Abuse: 0% confidence (6 reports but all false)

3. ✅ Behavior: Respects robots.txt, backs off on 429

4. ✅ ISP match: AbuseIPDB = "Google LLC", WHOIS = "Google LLC"

5. ❌ Threshold: <10 passes (0% < 10%) ✅ CORRECT


---


Recommendations


For Security Engineers


1. Set conservative thresholds (>10 not >5)

2. Whitelist known-good ASNs (Google: AS15169, Microsoft: AS8075, Ahrefs: AS14061)

3. Monitor false positive rate (<5% target)

4. Check WHOIS before blocking (Humpty Hump Principle)

5. Acknowledge mistakes publicly (epistemic humility)


For AI Bot Operators (OpenAI, Anthropic, Google)


1. Be transparent (clear User-Agent, public documentation)

2. Respect robots.txt (industry standard)

3. Respect rate limits (don't trigger ModSecurity)

4. Monitor partner behavior (ensure AWS doesn't weaponize your brand)

5. Publish WHOIS-matched IP ranges (help security engineers verify)


For Cloud Providers (AWS, Azure, Google Cloud)


1. Label infrastructure honestly (Amazon.com, not "Anthropic, PBC")

2. Don't weaponize customer brands (armor polishing, not denting)

3. Test before deploy (500K Trainium2 chips ≠ "just ship it")

4. Monitor abuse reports (74% in 4 days = problem)

5. Acknowledge when caught (AWS: still silent)


---


Appendix: Abuse Category Decoder


AbuseIPDB Categories (from 8.8.8.8 example):


| Category | Name | Example from Dataset | Valid for DNS? |

|----------|------|---------------------|----------------|

| 1 | DNS Compromise | 1 instance | ⚠️ Possible (if resolver poisoned) |

| 4 | DDoS | 2 instances | ❌ No (resolver ≠ attacker) |

| 7 | Brute Force | 8 instances | ❌ No (DNS doesn't brute force) |

| 8 | SQL Injection | 1 instance | ❌ No (DNS doesn't inject SQL) |

| 14 | Port Scan | 40+ instances | ❌ No (DNS uses port 53 only) |

| 15 | Hacking | 30+ instances | ❌ No (DNS resolves, doesn't hack) |

| 18 | Brute Force | 60+ instances | ❌ No (duplicate of category 7) |

| 19 | Botnet | 7 instances | ❌ No (8.8.8.8 is Google, not botnet) |

| 20 | Spam | 7 instances | ❌ No (DNS doesn't send email) |

| 21 | Web Spam | 15+ instances | ❌ No (DNS doesn't create web content) |

| 22 | Email Spam | 60+ instances | ❌ No (DNS doesn't send spam) |


Conclusion: 165 reports, ~160 are categorically WRONG. 0% abuse confidence = CORRECT.


Lesson: Report count ≠ actual maliciousness. Analyze behavior, not noise.


---


The Gold Standard: OpenAI GPTBot - How to Verify Legitimate AI Bots


Research Question: Can we identify a positive pattern for legitimate AI bot operators?


Answer: YES - OpenAI GPTBot sets the transparency standard.


OpenAI GPTBot: Verifiable Good Actor


Official IP Ranges Published: https://openai.com/gptbot.json


Example Response:

{
  "prefixes": [
    "20.15.240.64/28",
    "20.15.240.80/28",
    "20.15.240.96/28",
    "20.15.240.176/28",
    "20.15.241.0/28",
    "20.15.242.128/28",
    "20.15.242.144/28",
    "20.15.242.192/28",
    "40.83.2.64/28",
    "13.65.240.240/28",
    "52.230.152.0/28",
    "52.156.77.144/28",
    "20.97.189.96/28",
    "20.161.75.208/28",
    "52.234.32.208/28"
  ]
}

Why This Matters:


1. Verifiable ownership - WHOIS these IPs, they match OpenAI/Microsoft partnership (Azure hosting)

2. No impersonation possible - Security engineers can verify before blocking

3. Transparent documentation - Publicly accessible, machine-readable format

4. Industry standard JSON - Easy to integrate into firewalls, rate limiters, whitelists


---


Verification Checklist for Legitimate AI Bots


Before waving 👋 at an AI bot, verify:


✅ Tier 1 (Gold Standard - OpenAI GPTBot):

1. ☑ Published IP ranges (JSON endpoint like https://company.com/botname.json)

2. ☑ WHOIS matches User-Agent (Owner matches bot operator)

3. ☑ Respects robots.txt (Industry standard compliance)

4. ☑ Respects rate limits (Backs off on 429)

5. ☑ Clear documentation (Public disclosure of behavior)

6. ☑ Abuse score <10% (Pattern vs anomaly)


⚠️ Tier 2 (Verify First - Anthropic ClaudeBot):

  • ✅ Respects robots.txt (documented)

  • ✅ Clear documentation (official User-Agent)

  • ❌ **NO published IP ranges** for ClaudeBot crawler (gap identified)

  • ⚠️ API IP ranges documented, but **crawler IPs undocumented**


Result: Cannot easily verify ClaudeBot crawler legitimacy without WHOIS lookup per IP


🚫 Tier 3 (Block - AWS Impostor):

  • ❌ User-Agent claims "ClaudeBot" but WHOIS = Amazon.com, Inc.

  • ❌ AbuseIPDB labels "Anthropic, PBC" (misleading)

  • ❌ Ignores robots.txt (118 abuse reports cite violations)

  • ❌ Abuse score 74% (NOT <10%)

  • ❌ No official documentation (AWS Project Rainier undisclosed crawler activation)


---


Comparison Table: AI Bot Transparency


| Bot Operator | IP Ranges Published? | WHOIS Match? | robots.txt? | Abuse Score | Wave? |

|--------------|---------------------|--------------|-------------|-------------|-------|

| OpenAI GPTBot | ✅ gptbot.json (15 prefixes) | ✅ Verifiable | ✅ Respects | Unknown (not in dataset) | 👋 WAVE |

| Anthropic ClaudeBot (Real) | ❌ API only, NOT crawler | ⚠️ Unknown (no ranges) | ✅ Should respect | Unknown (not in dataset) | ⚠️ VERIFY FIRST |

| AWS Impostor "ClaudeBot" | ❌ Undisclosed | ❌ Amazon ≠ Anthropic | ❌ Ignores | 74% (118 reports) | 🚫 BLOCK |


---


Recommendation for Anthropic


Publish ClaudeBot Crawler IP Ranges:


Suggested Endpoint: https://www.anthropic.com/claudebot.json


Format (following OpenAI standard):

{
  "prefixes": [
    "X.X.X.X/28",
    "Y.Y.Y.Y/28",
    ...
  ]
}

Benefits:

1. Security engineers can whitelist legitimate ClaudeBot traffic

2. Prevents false positives (like our Aristocrats Incident - 19.7% FP rate)

3. Exposes impostors (if IP claims "ClaudeBot" but isn't in published ranges, it's fake)

4. Industry leadership (matches OpenAI transparency, differentiates from AWS secrecy)


Current Gap:

  • Anthropic publishes API IP ranges: https://docs.anthropic.com/en/api/ip-addresses

  • **BUT:** These are API endpoints (customers calling Claude), NOT ClaudeBot crawler IPs

  • **Result:** Cannot verify ClaudeBot crawler legitimacy without per-IP WHOIS lookup


---


The Wave Pattern Classification


When analyzing bot traffic:


👋 WAVE (Legitimate, Verified):

  • Published IP ranges (JSON endpoint)

  • WHOIS matches operator

  • Respects robots.txt + rate limits

  • Abuse score <10%

  • **Example:** OpenAI GPTBot (gold standard)


⚠️ VERIFY FIRST (Partial Transparency):

  • Some documentation (User-Agent, behavior guidelines)

  • NO published IP ranges

  • Respects robots.txt (documented)

  • **Requires:** WHOIS lookup per IP to verify

  • **Example:** Anthropic ClaudeBot (gap in crawler IP disclosure)


🚫 BLOCK (Impostor/Aggressive):

  • User-Agent doesn't match WHOIS

  • Public labels mislead (AbuseIPDB ≠ actual owner)

  • Ignores robots.txt / rate limits

  • Abuse score >10%

  • **Example:** AWS 216.73.216.112 (claims "Anthropic, PBC", actually Amazon.com, Inc.)


---


Philosophy: Transparency = Trust


OpenAI GPTBot Model:

  • "Here are our IPs. Verify them. Whitelist us if you trust us."

  • **Result:** Security engineers can make informed decisions


AWS Impostor Model:

  • "We're labeled 'Anthropic' but owned by Amazon. Good luck verifying us."

  • **Result:** Confusion, false attribution, reputation damage to Anthropic


The Difference:

  • OpenAI publishes → security engineers wave 👋

  • AWS hides → security engineers block 🚫


Our Stance:

  • If you publish IP ranges + respect robots.txt + WHOIS matches → We wave 👋

  • If you don't publish ranges but respect robots.txt + document behavior → We verify first ⚠️

  • If you impersonate others or ignore robots.txt → We block 🚫


Industry Request:

  • OpenAI: Keep doing what you're doing ✅

  • Anthropic: Publish ClaudeBot crawler IP ranges (close the gap)

  • AWS: Stop labeling Amazon infrastructure "Anthropic, PBC" (honest ownership)


Evidence Published:

  • OpenAI GPTBot ranges: https://openai.com/gptbot.json (verified Nov 5, 2025)

  • Anthropic API ranges: https://docs.anthropic.com/en/api/ip-addresses (verified)

  • Anthropic ClaudeBot crawler ranges: **404 NOT FOUND** (gap identified Nov 5, 2025)


Thank You:

  • **OpenAI:** For setting the transparency standard

  • **Anthropic:** For Constitutional AI (we use it to protect families)

  • **Security Community:** For respecting receipts and WHOIS verification


---


Tags



---


Author: Patrick Duggan (Randy/Dwarf), DugganUSA LLC

Partnership: Paul Galjan (Avi/King) - DARPA/OSD 1996-2000

Philosophy: "Check the metadata. WHOIS doesn't lie. Apologize when you're wrong. Fix what you broke."


Evidence: www.dugganusa.com (The Aristocrats, Apology to the 33, AWS Impostor Bot)


Status: Threshold >10 deployed Nov 5, 2025. Monitoring false positive rate. Target <5%.


---


"Friends polish armor - they don't dent it with their presence. And when you dent your own armor shooting yourself in the foot, you apologize to Googlebot and adjust your aim."


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page