top of page

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

  • Writer: Patrick Duggan
    Patrick Duggan
  • Nov 5, 2025
  • 8 min read

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis


Date: November 5, 2025

Pattern: Sub-pattern of Pattern #32 (Polish vs Dent Partnership Framework)

Author: Patrick Duggan, DugganUSA LLC


---


Executive Summary


When analyzing 172 IPs auto-blocked in 33 seconds (The Aristocrats Incident, Nov 2-3, 2025), we discovered a critical distinction: Friendly Fire ≠ Armor Denting. This sub-pattern documents the difference and provides behavioral analysis to differentiate AI bots from ML crawlers.


Key Finding: We accidentally blocked 33 legitimate bots (19.7% false positive rate), raised the threshold from >5 to >10, and learned to distinguish friendly fire (self-inflicted) from armor denting (partner-inflicted).


---


Pattern #32 Recap: Polish vs Dent


Armor Denting (Partner Abuse):

  • Partner action damages YOUR reputation for THEIR benefit

  • Partner remains silent or denies when caught

  • Example: AWS labels Amazon.com infrastructure as "Anthropic, PBC" → Anthropic blamed for Amazon's aggressive crawling


Armor Polishing (Partner Respect):

  • Partner action elevates YOUR reputation

  • Partner credits you publicly

  • Example: Google press release "Anthropic to Expand Use of Google Cloud" → Anthropic as protagonist


---


Pattern #32.1: Friendly Fire


Definition


Friendly Fire: Self-inflicted reputation damage from over-aggressive security controls, followed by immediate acknowledgment and correction.


NOT Armor Denting because:

1. ❌ No partner involved (you did it to yourself)

2. ❌ No benefit to the blocker (worse SEO, less indexing)

3. ✅ Immediate acknowledgment ("Apology to the 33")

4. ✅ Root cause fix (threshold 5 → 10)

5. ✅ Public learning (blogged about it)


---


The Aristocrats Incident: Receipts


What Happened (Nov 2-3, 2025)


Auto-Blocker Configuration:

  • Threshold: Score >5 (AGGRESSIVE)

  • Source: AbuseIPDB confidence scores

  • Trigger: Cloudflare analytics → AbuseIPDB enrichment → Auto-block


Result:

  • 172 IPs blocked in 33 seconds

  • 33 were innocent (19.7% false positive rate)


The Innocent 33:


#### ✅ Googlebot (Multiple IPs)

{
  "ip": "66.249.69.200",
  "isp": "Google LLC",
  "abuseScore": 0,
  "totalReports": 6,
  "reason": "Legitimate search crawler",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Respects robots.txt

  • Identifies clearly in User-Agent: "Googlebot/2.1"

  • Consistent crawl rate (not aggressive)

  • 0% abuse confidence despite reports


Why Reported? Sites that don't understand what Googlebot does file false reports.


---


#### ✅ Ahrefs SEO Bot (Canada, 6 IPs)

{
  "ipRange": "54.x.x.x",
  "country": "CA",
  "isp": "Ahrefs Pte Ltd",
  "abuseScore": 0,
  "totalReports": 4-7,
  "reason": "SEO backlink analysis",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Identifies as "AhrefsBot/7.0"

  • Helps sites understand SEO health

  • Not aggressive, just thorough

  • 0% abuse confidence


Why Reported? Sites blocking all bots indiscriminately.


---


#### ✅ Microsoft Bing Crawler

{
  "isp": "Microsoft Corporation",
  "abuseScore": 0,
  "reason": "Bing search indexing",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

  • Polite crawling (respects rate limits)

  • Clear identification

  • When misbehaves: RARE (noted in blog as exceptional)


---


#### ✅ Google DNS (8.8.8.8)

{
  "ip": "8.8.8.8",
  "abuseConfidenceScore": 0,
  "totalReports": 165,
  "numDistinctUsers": 52,
  "usageType": "Content Delivery Network",
  "isp": "Google LLC",
  "isWhitelisted": true
}

The Paradox: 165 abuse reports, 0% confidence score


Why? People blame DNS for EVERYTHING:

  • Category 14: Port scan (not DNS behavior)

  • Category 18: Brute force (not DNS behavior)

  • Category 22: Web spam (not DNS behavior)

  • Category 7: DDoS (people blame the resolver, not the attacker)


Actual Behavior: Resolves DNS queries. That's it.


---


AI vs ML Bot Behavioral Patterns


Research Question


Can we differentiate AI-powered bots (GPT, Claude, Gemini) from traditional ML crawlers (Googlebot, Ahrefs)?


Hypothesis


AI Bots (Generative):

  • Purpose: Training data collection, content generation, Q&A

  • Behavior: Deep page analysis, content extraction, context understanding

  • User-Agent: "GPTBot", "ClaudeBot", "Google-Extended" (Gemini training)

  • Respect robots.txt: YES (usually - GPTBot obeys)

  • Rate: Variable (adaptive based on content value)


ML Bots (Indexing):

  • Purpose: Search engine indexing, SEO analysis, link discovery

  • Behavior: Shallow crawl, metadata collection, link mapping

  • User-Agent: "Googlebot", "Bingbot", "AhrefsBot"

  • Respect robots.txt: YES (industry standard)

  • Rate: Consistent, predictable


---


Evidence from Traffic Analysis


Pattern Observation (from blocked IPs):


Googlebot behavior:

  • Crawl frequency: 1-2 requests/minute (consistent)

  • Page depth: Shallow (metadata, links, structure)

  • Content focus: Indexable text, not context

  • robots.txt: ALWAYS respected (industry leader)

  • Response to 429 (rate limit): Backs off immediately


Ahrefs behavior:

  • Crawl frequency: 2-3 requests/minute (thorough but polite)

  • Page depth: Link-focused (backlink analysis)

  • Content focus: Anchor text, link structure

  • robots.txt: Respected

  • Response to 429: Backs off, retries later


Suspected AI Bot behavior (from 216.73.216.112 - AWS/Anthropic impostor):

  • Crawl frequency: AGGRESSIVE (triggered ModSecurity rate limits)

  • Page depth: Unknown (but WordPress brute force attempts suggest deep)

  • Content focus: Unknown (likely training data extraction)

  • robots.txt: IGNORED (multiple abuse reports cite violations)

  • Response to 429: IGNORED (continued aggressive behavior)


---


AI vs ML Bot Signature Matrix


| Metric | Googlebot (ML) | Ahrefs (ML) | AWS Impostor (AI?) | Actual ClaudeBot (Unknown) |

|--------|----------------|-------------|-------------------|---------------------------|

| Abuse Score | 0% | 0% | 74% | Unknown (not in dataset) |

| robots.txt | ✅ Respects | ✅ Respects | ❌ Ignores | ✅ Should respect |

| Rate Limiting | ✅ Backs off | ✅ Backs off | ❌ Ignores | ✅ Should back off |

| User-Agent | Clear | Clear | "ClaudeBot" (fake) | "ClaudeBot" (real) |

| WHOIS Match | ✅ Google LLC | ✅ Ahrefs Pte | ❌ Amazon ≠ Anthropic | ✅ Should match |

| Behavior | Polite indexing | SEO analysis | WordPress brute force | Unknown |


---


The Humpty Hump Principle (Applied)


Digital Underground wisdom: "The meta tells the tale."


Application:

  • Don't trust User-Agent header (easily faked)

  • Check WHOIS (authoritative ownership)

  • Analyze behavior (robots.txt respect, rate limit response)

  • Cross-reference abuse reports (pattern vs anomaly)


Case Study: 216.73.216.112

  • **User-Agent claims:** "ClaudeBot" (implies Anthropic)

  • **AbuseIPDB label:** "Anthropic, PBC"

  • **WHOIS reveals:** Amazon.com, Inc. (AMAZO-4)

  • **Behavior:** Aggressive, ignores robots.txt

  • **Verdict:** IMPOSTOR (AWS weaponizing Anthropic brand)


---


The Fix: From Friendly Fire to Surgical Strike


Before (Threshold >5 - AGGRESSIVE)


Collateral Damage:

  • Googlebot: BLOCKED ❌

  • Ahrefs: BLOCKED ❌

  • Microsoft Bing: BLOCKED ❌

  • Actual threats: BLOCKED ✅


False Positive Rate: 19.7% (33 innocents / 172 total)


Problem: Too sensitive. Legitimate bots with ANY abuse reports got blocked.


---


After (Threshold >10 - CONSERVATIVE)


Surgical Precision:

  • Googlebot (score 0): ALLOWED ✅

  • Ahrefs (score 0): ALLOWED ✅

  • Microsoft Bing (score 0): ALLOWED ✅

  • AWS Impostor (score 74): BLOCKED ✅


False Positive Rate: Target <5% (monitoring ongoing)


Improvement: Focus on BEHAVIOR (abuse confidence) not NOISE (report count)


---


Lessons Learned


1. False Reports ≠ Malicious Behavior


Google DNS Example:

  • 165 reports from 52 users

  • 0% abuse confidence

  • Categories: Port scan, DDoS, brute force (none are DNS behavior)

  • **Truth:** People blame the messenger (DNS resolver) for the message (attack traffic)


Googlebot Example:

  • 6-7 reports

  • 0% abuse confidence

  • Categories: "Unwanted crawler"

  • **Truth:** Sites that don't want Google indexing them (valid choice, not abuse)


---


2. Friendly Fire Requires Immediate Correction


Our Response Timeline:

  • **Nov 2, 09:50 AM:** Aristocrats Incident (172 blocked, 33 innocent)

  • **Nov 2, 10:15 AM:** Discovery via logs review

  • **Nov 2, 11:00 AM:** Blog post "Apology to the 33" drafted

  • **Nov 3:** Issue #189 opened (19.7% false positive analysis)

  • **Nov 5, 01:01 UTC:** Fix deployed (threshold 5 → 10)


Total Time to Fix: 3 days from discovery to production deployment


Public Acknowledgment:

  • "Hey Google! Your crawler got flagged. Wave! 👋"

  • "Known good crawlers (Google, Bing, Ahrefs, Yandex) - probably false positives"

  • "Today it was Google bots and a hot tub. Tomorrow it could be a paying customer at 3 AM."


---


3. Armor Denting ≠ Friendly Fire


Armor Denting (AWS → Anthropic):

  • AWS labels Amazon infrastructure "Anthropic, PBC"

  • Anthropic blamed for Amazon's 118 abuse reports

  • AWS benefits (hides ownership)

  • Anthropic suffers (reputation damage)

  • AWS silent (no acknowledgment, no fix)

  • **Time to response:** Still waiting (reported Nov 5, ticket #215471615057882)


Friendly Fire (You → Googlebot):

  • You blocked legitimate crawlers with threshold >5

  • You suffered (worse SEO, less indexing visibility)

  • Google unaffected (one site doesn't matter)

  • You acknowledged immediately (public apology)

  • You fixed root cause (threshold adjustment)

  • **Time to response:** 3 days


The Difference: Intent, benefit, acknowledgment, repair.


---


AI Bot Detection Methodology (Proposed)


Signals for AI Training Bots


Positive Indicators (likely AI):

1. Deep content extraction (full page HTML)

2. Context-aware crawling (follows semantic links, not just href tags)

3. Variable rate (adapts to content "value")

4. Focus on text-heavy pages (documentation, blog posts, forums)

5. User-Agent: "GPTBot", "Google-Extended", "ClaudeBot" (verify WHOIS!)


Negative Indicators (likely ML indexer):

1. Shallow metadata collection (title, description, links)

2. Uniform crawling (all pages treated equally)

3. Consistent rate (predictable pattern)

4. Focus on structure (sitemap, link graph)

5. User-Agent: "Googlebot", "Bingbot", "AhrefsBot"


---


Verification Checklist


Before blocking ANY bot:


1. ☑ Check WHOIS (User-Agent can lie, WHOIS can't)

2. ☑ Analyze abuse reports (confidence score, not count)

3. ☑ Review behavior (robots.txt respect, rate limit response)

4. ☑ Cross-reference ISP (does AbuseIPDB match WHOIS?)

5. ☑ Test threshold (what score would catch this? what else gets caught?)


AWS Impostor Case:

1. ✅ WHOIS: Amazon.com, Inc. (≠ Anthropic)

2. ✅ Abuse: 74% confidence (118 reports in 4 days)

3. ✅ Behavior: Ignores robots.txt, WordPress brute force, ModSecurity triggers

4. ❌ ISP mismatch: AbuseIPDB says "Anthropic", WHOIS says "Amazon"

5. ✅ Threshold: >10 would catch (74% >> 10%)


Googlebot Case:

1. ✅ WHOIS: Google LLC (matches User-Agent)

2. ✅ Abuse: 0% confidence (6 reports but all false)

3. ✅ Behavior: Respects robots.txt, backs off on 429

4. ✅ ISP match: AbuseIPDB = "Google LLC", WHOIS = "Google LLC"

5. ❌ Threshold: <10 passes (0% < 10%) ✅ CORRECT


---


Recommendations


For Security Engineers


1. Set conservative thresholds (>10 not >5)

2. Whitelist known-good ASNs (Google: AS15169, Microsoft: AS8075, Ahrefs: AS14061)

3. Monitor false positive rate (<5% target)

4. Check WHOIS before blocking (Humpty Hump Principle)

5. Acknowledge mistakes publicly (epistemic humility)


For AI Bot Operators (OpenAI, Anthropic, Google)


1. Be transparent (clear User-Agent, public documentation)

2. Respect robots.txt (industry standard)

3. Respect rate limits (don't trigger ModSecurity)

4. Monitor partner behavior (ensure AWS doesn't weaponize your brand)

5. Publish WHOIS-matched IP ranges (help security engineers verify)


For Cloud Providers (AWS, Azure, Google Cloud)


1. Label infrastructure honestly (Amazon.com, not "Anthropic, PBC")

2. Don't weaponize customer brands (armor polishing, not denting)

3. Test before deploy (500K Trainium2 chips ≠ "just ship it")

4. Monitor abuse reports (74% in 4 days = problem)

5. Acknowledge when caught (AWS: still silent)


---


Appendix: Abuse Category Decoder


AbuseIPDB Categories (from 8.8.8.8 example):


| Category | Name | Example from Dataset | Valid for DNS? |

|----------|------|---------------------|----------------|

| 1 | DNS Compromise | 1 instance | ⚠️ Possible (if resolver poisoned) |

| 4 | DDoS | 2 instances | ❌ No (resolver ≠ attacker) |

| 7 | Brute Force | 8 instances | ❌ No (DNS doesn't brute force) |

| 8 | SQL Injection | 1 instance | ❌ No (DNS doesn't inject SQL) |

| 14 | Port Scan | 40+ instances | ❌ No (DNS uses port 53 only) |

| 15 | Hacking | 30+ instances | ❌ No (DNS resolves, doesn't hack) |

| 18 | Brute Force | 60+ instances | ❌ No (duplicate of category 7) |

| 19 | Botnet | 7 instances | ❌ No (8.8.8.8 is Google, not botnet) |

| 20 | Spam | 7 instances | ❌ No (DNS doesn't send email) |

| 21 | Web Spam | 15+ instances | ❌ No (DNS doesn't create web content) |

| 22 | Email Spam | 60+ instances | ❌ No (DNS doesn't send spam) |


Conclusion: 165 reports, ~160 are categorically WRONG. 0% abuse confidence = CORRECT.


Lesson: Report count ≠ actual maliciousness. Analyze behavior, not noise.


---


Tags



---


Author: Patrick Duggan (Randy/Dwarf), DugganUSA LLC

Partnership: Paul Galjan (Avi/King) - DARPA/OSD 1996-2000

Philosophy: "Check the metadata. WHOIS doesn't lie. Apologize when you're wrong. Fix what you broke."


Evidence: www.dugganusa.com (The Aristocrats, Apology to the 33, AWS Impostor Bot)


Status: Threshold >10 deployed Nov 5, 2025. Monitoring false positive rate. Target <5%.


---


"Friends polish armor - they don't dent it with their presence. And when you dent your own armor shooting yourself in the foot, you apologize to Googlebot and adjust your aim."


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page