Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Patrick Duggan
Nov 5, 2025
11 min read

Updated: Apr 25

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Date: November 5, 2025

Pattern: Sub-pattern of Pattern #32 (Polish vs Dent Partnership Framework)

Author: Patrick Duggan, DugganUSA LLC

---

Executive Summary

When analyzing 172 IPs auto-blocked in 33 seconds (The Aristocrats Incident, Nov 2-3, 2025), we discovered a critical distinction: Friendly Fire ≠ Armor Denting. This sub-pattern documents the difference and provides behavioral analysis to differentiate AI bots from ML crawlers.

Key Finding: We accidentally blocked 33 legitimate bots (19.7% false positive rate), raised the threshold from >5 to >10, and learned to distinguish friendly fire (self-inflicted) from armor denting (partner-inflicted).

---

Pattern #32 Recap: Polish vs Dent

Armor Denting (Partner Abuse):

Partner action damages YOUR reputation for THEIR benefit
Partner remains silent or denies when caught
Example: AWS labels Amazon.com infrastructure as "Anthropic, PBC" → Anthropic blamed for Amazon's aggressive crawling

Armor Polishing (Partner Respect):

Partner action elevates YOUR reputation
Partner credits you publicly
Example: Google press release "Anthropic to Expand Use of Google Cloud" → Anthropic as protagonist

---

Pattern #32.1: Friendly Fire

Definition

Friendly Fire: Self-inflicted reputation damage from over-aggressive security controls, followed by immediate acknowledgment and correction.

NOT Armor Denting because:

1. ❌ No partner involved (you did it to yourself)

2. ❌ No benefit to the blocker (worse SEO, less indexing)

3. ✅ Immediate acknowledgment ("Apology to the 33")

4. ✅ Root cause fix (threshold 5 → 10)

5. ✅ Public learning (blogged about it)

---

The Aristocrats Incident: Receipts

What Happened (Nov 2-3, 2025)

Auto-Blocker Configuration:

Threshold: Score >5 (AGGRESSIVE)
Source: AbuseIPDB confidence scores
Trigger: Cloudflare analytics → AbuseIPDB enrichment → Auto-block

Result:

172 IPs blocked in 33 seconds
33 were innocent (19.7% false positive rate)

The Innocent 33:

#### ✅ Googlebot (Multiple IPs)

{
  "ip": "66.249.69.200",
  "isp": "Google LLC",
  "abuseScore": 0,
  "totalReports": 6,
  "reason": "Legitimate search crawler",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Respects robots.txt
Identifies clearly in User-Agent: "Googlebot/2.1"
Consistent crawl rate (not aggressive)
0% abuse confidence despite reports

Why Reported? Sites that don't understand what Googlebot does file false reports.

---

#### ✅ Ahrefs SEO Bot (Canada, 6 IPs)

{
  "ipRange": "54.x.x.x",
  "country": "CA",
  "isp": "Ahrefs Pte Ltd",
  "abuseScore": 0,
  "totalReports": 4-7,
  "reason": "SEO backlink analysis",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Identifies as "AhrefsBot/7.0"
Helps sites understand SEO health
Not aggressive, just thorough
0% abuse confidence

Why Reported? Sites blocking all bots indiscriminately.

---

#### ✅ Microsoft Bing Crawler

{
  "isp": "Microsoft Corporation",
  "abuseScore": 0,
  "reason": "Bing search indexing",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Polite crawling (respects rate limits)
Clear identification
When misbehaves: RARE (noted in blog as exceptional)

---

#### ✅ Google DNS (8.8.8.8)

{
  "ip": "8.8.8.8",
  "abuseConfidenceScore": 0,
  "totalReports": 165,
  "numDistinctUsers": 52,
  "usageType": "Content Delivery Network",
  "isp": "Google LLC",
  "isWhitelisted": true
}

The Paradox: 165 abuse reports, 0% confidence score

Why? People blame DNS for EVERYTHING:

Category 14: Port scan (not DNS behavior)
Category 18: Brute force (not DNS behavior)
Category 22: Web spam (not DNS behavior)
Category 7: DDoS (people blame the resolver, not the attacker)

Actual Behavior: Resolves DNS queries. That's it.

---

AI vs ML Bot Behavioral Patterns

Research Question

Can we differentiate AI-powered bots (GPT, Claude, Gemini) from traditional ML crawlers (Googlebot, Ahrefs)?

Hypothesis

AI Bots (Generative):

Purpose: Training data collection, content generation, Q&A
Behavior: Deep page analysis, content extraction, context understanding
User-Agent: "GPTBot", "ClaudeBot", "Google-Extended" (Gemini training)
Respect robots.txt: YES (usually - GPTBot obeys)
Rate: Variable (adaptive based on content value)

ML Bots (Indexing):

Purpose: Search engine indexing, SEO analysis, link discovery
Behavior: Shallow crawl, metadata collection, link mapping
User-Agent: "Googlebot", "Bingbot", "AhrefsBot"
Respect robots.txt: YES (industry standard)
Rate: Consistent, predictable

---

Evidence from Traffic Analysis

Pattern Observation (from blocked IPs):

Googlebot behavior:

Crawl frequency: 1-2 requests/minute (consistent)
Page depth: Shallow (metadata, links, structure)
Content focus: Indexable text, not context
robots.txt: ALWAYS respected (industry leader)
Response to 429 (rate limit): Backs off immediately

Ahrefs behavior:

Crawl frequency: 2-3 requests/minute (thorough but polite)
Page depth: Link-focused (backlink analysis)
Content focus: Anchor text, link structure
robots.txt: Respected
Response to 429: Backs off, retries later

Suspected AI Bot behavior (from 216.73.216.112 - AWS/Anthropic impostor):

Crawl frequency: AGGRESSIVE (triggered ModSecurity rate limits)
Page depth: Unknown (but WordPress brute force attempts suggest deep)
Content focus: Unknown (likely training data extraction)
robots.txt: IGNORED (multiple abuse reports cite violations)
Response to 429: IGNORED (continued aggressive behavior)

---

AI vs ML Bot Signature Matrix

|--------|----------------|-------------|-------------------|---------------------------|

| Abuse Score | 0% | 0% | 74% | Unknown (not in dataset) |

---

The Humpty Hump Principle (Applied)

Digital Underground wisdom: "The meta tells the tale."

Application:

Don't trust User-Agent header (easily faked)
Check WHOIS (authoritative ownership)
Analyze behavior (robots.txt respect, rate limit response)
Cross-reference abuse reports (pattern vs anomaly)

Case Study: 216.73.216.112

**User-Agent claims:** "ClaudeBot" (implies Anthropic)
**AbuseIPDB label:** "Anthropic, PBC"
**WHOIS reveals:** Amazon.com, Inc. (AMAZO-4)
**Behavior:** Aggressive, ignores robots.txt
**Verdict:** IMPOSTOR (AWS weaponizing Anthropic brand)

---

The Fix: From Friendly Fire to Surgical Strike

Before (Threshold >5 - AGGRESSIVE)

Collateral Damage:

Googlebot: BLOCKED ❌
Ahrefs: BLOCKED ❌
Microsoft Bing: BLOCKED ❌
Actual threats: BLOCKED ✅

False Positive Rate: 19.7% (33 innocents / 172 total)

Problem: Too sensitive. Legitimate bots with ANY abuse reports got blocked.

---

After (Threshold >10 - CONSERVATIVE)

Surgical Precision:

Googlebot (score 0): ALLOWED ✅
Ahrefs (score 0): ALLOWED ✅
Microsoft Bing (score 0): ALLOWED ✅
AWS Impostor (score 74): BLOCKED ✅

False Positive Rate: Target <5% (monitoring ongoing)

Improvement: Focus on BEHAVIOR (abuse confidence) not NOISE (report count)

---

Lessons Learned

1. False Reports ≠ Malicious Behavior

Google DNS Example:

165 reports from 52 users
0% abuse confidence
Categories: Port scan, DDoS, brute force (none are DNS behavior)
**Truth:** People blame the messenger (DNS resolver) for the message (attack traffic)

Googlebot Example:

6-7 reports
0% abuse confidence
Categories: "Unwanted crawler"
**Truth:** Sites that don't want Google indexing them (valid choice, not abuse)

---

2. Friendly Fire Requires Immediate Correction

Our Response Timeline:

**Nov 2, 09:50 AM:** Aristocrats Incident (172 blocked, 33 innocent)
**Nov 2, 10:15 AM:** Discovery via logs review
**Nov 2, 11:00 AM:** Blog post "Apology to the 33" drafted
**Nov 3:** Issue #189 opened (19.7% false positive analysis)
**Nov 5, 01:01 UTC:** Fix deployed (threshold 5 → 10)

Total Time to Fix: 3 days from discovery to production deployment

Public Acknowledgment:

"Hey Google! Your crawler got flagged. Wave! 👋"
"Known good crawlers (Google, Bing, Ahrefs, Yandex) - probably false positives"
"Today it was Google bots and a hot tub. Tomorrow it could be a paying customer at 3 AM."

---

3. Armor Denting ≠ Friendly Fire

Armor Denting (AWS → Anthropic):

AWS labels Amazon infrastructure "Anthropic, PBC"
Anthropic blamed for Amazon's 118 abuse reports
AWS benefits (hides ownership)
Anthropic suffers (reputation damage)
AWS silent (no acknowledgment, no fix)
**Time to response:** Still waiting (reported Nov 5, ticket #215471615057882)

Friendly Fire (You → Googlebot):

You blocked legitimate crawlers with threshold >5
You suffered (worse SEO, less indexing visibility)
Google unaffected (one site doesn't matter)
You acknowledged immediately (public apology)
You fixed root cause (threshold adjustment)
**Time to response:** 3 days

The Difference: Intent, benefit, acknowledgment, repair.

---

AI Bot Detection Methodology (Proposed)

Signals for AI Training Bots

Positive Indicators (likely AI):

1. Deep content extraction (full page HTML)

2. Context-aware crawling (follows semantic links, not just href tags)

3. Variable rate (adapts to content "value")

4. Focus on text-heavy pages (documentation, blog posts, forums)

5. User-Agent: "GPTBot", "Google-Extended", "ClaudeBot" (verify WHOIS!)

Negative Indicators (likely ML indexer):

1. Shallow metadata collection (title, description, links)

2. Uniform crawling (all pages treated equally)

3. Consistent rate (predictable pattern)

4. Focus on structure (sitemap, link graph)

5. User-Agent: "Googlebot", "Bingbot", "AhrefsBot"

---

Verification Checklist

Before blocking ANY bot:

1. ☑ Check WHOIS (User-Agent can lie, WHOIS can't)

2. ☑ Analyze abuse reports (confidence score, not count)

3. ☑ Review behavior (robots.txt respect, rate limit response)

4. ☑ Cross-reference ISP (does AbuseIPDB match WHOIS?)

5. ☑ Test threshold (what score would catch this? what else gets caught?)

AWS Impostor Case:

1. ✅ WHOIS: Amazon.com, Inc. (≠ Anthropic)

2. ✅ Abuse: 74% confidence (118 reports in 4 days)

3. ✅ Behavior: Ignores robots.txt, WordPress brute force, ModSecurity triggers

4. ❌ ISP mismatch: AbuseIPDB says "Anthropic", WHOIS says "Amazon"

5. ✅ Threshold: >10 would catch (74% >> 10%)

Googlebot Case:

1. ✅ WHOIS: Google LLC (matches User-Agent)

2. ✅ Abuse: 0% confidence (6 reports but all false)

3. ✅ Behavior: Respects robots.txt, backs off on 429

4. ✅ ISP match: AbuseIPDB = "Google LLC", WHOIS = "Google LLC"

5. ❌ Threshold: <10 passes (0% < 10%) ✅ CORRECT

---

Recommendations

For Security Engineers

1. Set conservative thresholds (>10 not >5)

2. Whitelist known-good ASNs (Google: AS15169, Microsoft: AS8075, Ahrefs: AS14061)

3. Monitor false positive rate (<5% target)

4. Check WHOIS before blocking (Humpty Hump Principle)

5. Acknowledge mistakes publicly (epistemic humility)

For AI Bot Operators (OpenAI, Anthropic, Google)

1. Be transparent (clear User-Agent, public documentation)

2. Respect robots.txt (industry standard)

3. Respect rate limits (don't trigger ModSecurity)

Microsoft pulls this feed daily. AT&T pulls this feed daily. Starlink pulls this feed daily. Get the DugganUSA STIX feed — $9/mo →

4. Monitor partner behavior (ensure AWS doesn't weaponize your brand)

5. Publish WHOIS-matched IP ranges (help security engineers verify)

For Cloud Providers (AWS, Azure, Google Cloud)

1. Label infrastructure honestly (Amazon.com, not "Anthropic, PBC")

2. Don't weaponize customer brands (armor polishing, not denting)

3. Test before deploy (500K Trainium2 chips ≠ "just ship it")

4. Monitor abuse reports (74% in 4 days = problem)

5. Acknowledge when caught (AWS: still silent)

---

Appendix: Abuse Category Decoder

AbuseIPDB Categories (from 8.8.8.8 example):

|----------|------|---------------------|----------------|

Conclusion: 165 reports, ~160 are categorically WRONG. 0% abuse confidence = CORRECT.

Lesson: Report count ≠ actual maliciousness. Analyze behavior, not noise.

---

The Gold Standard: OpenAI GPTBot - How to Verify Legitimate AI Bots

Research Question: Can we identify a positive pattern for legitimate AI bot operators?

Answer: YES - OpenAI GPTBot sets the transparency standard.

OpenAI GPTBot: Verifiable Good Actor

Official IP Ranges Published: https://openai.com/gptbot.json

Example Response:

{
  "prefixes": [
    "20.15.240.64/28",
    "20.15.240.80/28",
    "20.15.240.96/28",
    "20.15.240.176/28",
    "20.15.241.0/28",
    "20.15.242.128/28",
    "20.15.242.144/28",
    "20.15.242.192/28",
    "40.83.2.64/28",
    "13.65.240.240/28",
    "52.230.152.0/28",
    "52.156.77.144/28",
    "20.97.189.96/28",
    "20.161.75.208/28",
    "52.234.32.208/28"
  ]
}

Why This Matters:

1. Verifiable ownership - WHOIS these IPs, they match OpenAI/Microsoft partnership (Azure hosting)

2. No impersonation possible - Security engineers can verify before blocking

3. Transparent documentation - Publicly accessible, machine-readable format

4. Industry standard JSON - Easy to integrate into firewalls, rate limiters, whitelists

---

Verification Checklist for Legitimate AI Bots

Before waving 👋 at an AI bot, verify:

✅ Tier 1 (Gold Standard - OpenAI GPTBot):

1. ☑ Published IP ranges (JSON endpoint like https://company.com/botname.json)

2. ☑ WHOIS matches User-Agent (Owner matches bot operator)

3. ☑ Respects robots.txt (Industry standard compliance)

4. ☑ Respects rate limits (Backs off on 429)

5. ☑ Clear documentation (Public disclosure of behavior)

6. ☑ Abuse score <10% (Pattern vs anomaly)

⚠️ Tier 2 (Verify First - Anthropic ClaudeBot):

✅ Respects robots.txt (documented)
✅ Clear documentation (official User-Agent)
❌ **NO published IP ranges** for ClaudeBot crawler (gap identified)
⚠️ API IP ranges documented, but **crawler IPs undocumented**

Result: Cannot easily verify ClaudeBot crawler legitimacy without WHOIS lookup per IP

🚫 Tier 3 (Block - AWS Impostor):

❌ User-Agent claims "ClaudeBot" but WHOIS = Amazon.com, Inc.
❌ AbuseIPDB labels "Anthropic, PBC" (misleading)
❌ Ignores robots.txt (118 abuse reports cite violations)
❌ Abuse score 74% (NOT <10%)
❌ No official documentation (AWS Project Rainier undisclosed crawler activation)

---

Comparison Table: AI Bot Transparency

|--------------|---------------------|--------------|-------------|-------------|-------|

---

Recommendation for Anthropic

Publish ClaudeBot Crawler IP Ranges:

Suggested Endpoint: https://www.anthropic.com/claudebot.json

Format (following OpenAI standard):

{
  "prefixes": [
    "X.X.X.X/28",
    "Y.Y.Y.Y/28",
    ...
  ]
}

Benefits:

1. Security engineers can whitelist legitimate ClaudeBot traffic

2. Prevents false positives (like our Aristocrats Incident - 19.7% FP rate)

3. Exposes impostors (if IP claims "ClaudeBot" but isn't in published ranges, it's fake)

4. Industry leadership (matches OpenAI transparency, differentiates from AWS secrecy)

Current Gap:

Anthropic publishes API IP ranges: https://docs.anthropic.com/en/api/ip-addresses
**BUT:** These are API endpoints (customers calling Claude), NOT ClaudeBot crawler IPs
**Result:** Cannot verify ClaudeBot crawler legitimacy without per-IP WHOIS lookup

---

The Wave Pattern Classification

When analyzing bot traffic:

👋 WAVE (Legitimate, Verified):

Published IP ranges (JSON endpoint)
WHOIS matches operator
Respects robots.txt + rate limits
Abuse score <10%
**Example:** OpenAI GPTBot (gold standard)

⚠️ VERIFY FIRST (Partial Transparency):

Some documentation (User-Agent, behavior guidelines)
NO published IP ranges
Respects robots.txt (documented)
**Requires:** WHOIS lookup per IP to verify
**Example:** Anthropic ClaudeBot (gap in crawler IP disclosure)

🚫 BLOCK (Impostor/Aggressive):

User-Agent doesn't match WHOIS
Public labels mislead (AbuseIPDB ≠ actual owner)
Ignores robots.txt / rate limits
Abuse score >10%
**Example:** AWS 216.73.216.112 (claims "Anthropic, PBC", actually Amazon.com, Inc.)

---

Philosophy: Transparency = Trust

OpenAI GPTBot Model:

"Here are our IPs. Verify them. Whitelist us if you trust us."
**Result:** Security engineers can make informed decisions

AWS Impostor Model:

"We're labeled 'Anthropic' but owned by Amazon. Good luck verifying us."
**Result:** Confusion, false attribution, reputation damage to Anthropic

The Difference:

OpenAI publishes → security engineers wave 👋
AWS hides → security engineers block 🚫

Our Stance:

If you publish IP ranges + respect robots.txt + WHOIS matches → We wave 👋
If you don't publish ranges but respect robots.txt + document behavior → We verify first ⚠️
If you impersonate others or ignore robots.txt → We block 🚫

Industry Request:

OpenAI: Keep doing what you're doing ✅
Anthropic: Publish ClaudeBot crawler IP ranges (close the gap)
AWS: Stop labeling Amazon infrastructure "Anthropic, PBC" (honest ownership)

Evidence Published:

OpenAI GPTBot ranges: https://openai.com/gptbot.json (verified Nov 5, 2025)
Anthropic API ranges: https://docs.anthropic.com/en/api/ip-addresses (verified)
Anthropic ClaudeBot crawler ranges: **404 NOT FOUND** (gap identified Nov 5, 2025)

Thank You:

**OpenAI:** For setting the transparency standard
**Anthropic:** For Constitutional AI (we use it to protect families)
**Security Community:** For respecting receipts and WHOIS verification

---

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Executive Summary

Pattern #32 Recap: Polish vs Dent

Pattern #32.1: Friendly Fire

Definition

The Aristocrats Incident: Receipts

What Happened (Nov 2-3, 2025)

AI vs ML Bot Behavioral Patterns

Research Question

Hypothesis

Evidence from Traffic Analysis

AI vs ML Bot Signature Matrix

The Humpty Hump Principle (Applied)

The Fix: From Friendly Fire to Surgical Strike

Before (Threshold >5 - AGGRESSIVE)

After (Threshold >10 - CONSERVATIVE)

Lessons Learned

1. False Reports ≠ Malicious Behavior

2. Friendly Fire Requires Immediate Correction

3. Armor Denting ≠ Friendly Fire

AI Bot Detection Methodology (Proposed)

Signals for AI Training Bots

Verification Checklist

Recommendations

For Security Engineers

For AI Bot Operators (OpenAI, Anthropic, Google)

For Cloud Providers (AWS, Azure, Google Cloud)

Appendix: Abuse Category Decoder

The Gold Standard: OpenAI GPTBot - How to Verify Legitimate AI Bots

OpenAI GPTBot: Verifiable Good Actor

Verification Checklist for Legitimate AI Bots

Comparison Table: AI Bot Transparency

Recommendation for Anthropic

The Wave Pattern Classification

Philosophy: Transparency = Trust

Tags

Recent Posts

Comments