Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Patrick Duggan
Nov 5, 2025
8 min read

Updated: Apr 25

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Date: November 5, 2025

Pattern: Sub-pattern of Pattern #32 (Polish vs Dent Partnership Framework)

Author: Patrick Duggan, DugganUSA LLC

---

Executive Summary

When analyzing 172 IPs auto-blocked in 33 seconds (The Aristocrats Incident, Nov 2-3, 2025), we discovered a critical distinction: Friendly Fire ≠ Armor Denting. This sub-pattern documents the difference and provides behavioral analysis to differentiate AI bots from ML crawlers.

Key Finding: We accidentally blocked 33 legitimate bots (19.7% false positive rate), raised the threshold from >5 to >10, and learned to distinguish friendly fire (self-inflicted) from armor denting (partner-inflicted).

---

Pattern #32 Recap: Polish vs Dent

Armor Denting (Partner Abuse):

Partner action damages YOUR reputation for THEIR benefit
Partner remains silent or denies when caught
Example: AWS labels Amazon.com infrastructure as "Anthropic, PBC" → Anthropic blamed for Amazon's aggressive crawling

Armor Polishing (Partner Respect):

Partner action elevates YOUR reputation
Partner credits you publicly
Example: Google press release "Anthropic to Expand Use of Google Cloud" → Anthropic as protagonist

---

Pattern #32.1: Friendly Fire

Definition

Friendly Fire: Self-inflicted reputation damage from over-aggressive security controls, followed by immediate acknowledgment and correction.

NOT Armor Denting because:

1. ❌ No partner involved (you did it to yourself)

2. ❌ No benefit to the blocker (worse SEO, less indexing)

3. ✅ Immediate acknowledgment ("Apology to the 33")

4. ✅ Root cause fix (threshold 5 → 10)

5. ✅ Public learning (blogged about it)

---

The Aristocrats Incident: Receipts

What Happened (Nov 2-3, 2025)

Auto-Blocker Configuration:

Threshold: Score >5 (AGGRESSIVE)
Source: AbuseIPDB confidence scores
Trigger: Cloudflare analytics → AbuseIPDB enrichment → Auto-block

Result:

172 IPs blocked in 33 seconds
33 were innocent (19.7% false positive rate)

The Innocent 33:

#### ✅ Googlebot (Multiple IPs)

{
  "ip": "66.249.69.200",
  "isp": "Google LLC",
  "abuseScore": 0,
  "totalReports": 6,
  "reason": "Legitimate search crawler",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Respects robots.txt
Identifies clearly in User-Agent: "Googlebot/2.1"
Consistent crawl rate (not aggressive)
0% abuse confidence despite reports

Why Reported? Sites that don't understand what Googlebot does file false reports.

---

#### ✅ Ahrefs SEO Bot (Canada, 6 IPs)

{
  "ipRange": "54.x.x.x",
  "country": "CA",
  "isp": "Ahrefs Pte Ltd",
  "abuseScore": 0,
  "totalReports": 4-7,
  "reason": "SEO backlink analysis",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Identifies as "AhrefsBot/7.0"
Helps sites understand SEO health
Not aggressive, just thorough
0% abuse confidence

Why Reported? Sites blocking all bots indiscriminately.

---

#### ✅ Microsoft Bing Crawler

{
  "isp": "Microsoft Corporation",
  "abuseScore": 0,
  "reason": "Bing search indexing",
  "status": "BLOCKED (oops)"
}

Behavior Pattern:

Polite crawling (respects rate limits)
Clear identification
When misbehaves: RARE (noted in blog as exceptional)

---

#### ✅ Google DNS (8.8.8.8)

{
  "ip": "8.8.8.8",
  "abuseConfidenceScore": 0,
  "totalReports": 165,
  "numDistinctUsers": 52,
  "usageType": "Content Delivery Network",
  "isp": "Google LLC",
  "isWhitelisted": true
}

The Paradox: 165 abuse reports, 0% confidence score

Why? People blame DNS for EVERYTHING:

Category 14: Port scan (not DNS behavior)
Category 18: Brute force (not DNS behavior)
Category 22: Web spam (not DNS behavior)
Category 7: DDoS (people blame the resolver, not the attacker)

Actual Behavior: Resolves DNS queries. That's it.

---

AI vs ML Bot Behavioral Patterns

Research Question

Can we differentiate AI-powered bots (GPT, Claude, Gemini) from traditional ML crawlers (Googlebot, Ahrefs)?

Hypothesis

AI Bots (Generative):

Purpose: Training data collection, content generation, Q&A
Behavior: Deep page analysis, content extraction, context understanding
User-Agent: "GPTBot", "ClaudeBot", "Google-Extended" (Gemini training)
Respect robots.txt: YES (usually - GPTBot obeys)
Rate: Variable (adaptive based on content value)

ML Bots (Indexing):

Purpose: Search engine indexing, SEO analysis, link discovery
Behavior: Shallow crawl, metadata collection, link mapping
User-Agent: "Googlebot", "Bingbot", "AhrefsBot"
Respect robots.txt: YES (industry standard)
Rate: Consistent, predictable

---

Evidence from Traffic Analysis

Pattern Observation (from blocked IPs):

Googlebot behavior:

Crawl frequency: 1-2 requests/minute (consistent)
Page depth: Shallow (metadata, links, structure)
Content focus: Indexable text, not context
robots.txt: ALWAYS respected (industry leader)
Response to 429 (rate limit): Backs off immediately

Ahrefs behavior:

Crawl frequency: 2-3 requests/minute (thorough but polite)
Page depth: Link-focused (backlink analysis)
Content focus: Anchor text, link structure
robots.txt: Respected
Response to 429: Backs off, retries later

Suspected AI Bot behavior (from 216.73.216.112 - AWS/Anthropic impostor):

Crawl frequency: AGGRESSIVE (triggered ModSecurity rate limits)
Page depth: Unknown (but WordPress brute force attempts suggest deep)
Content focus: Unknown (likely training data extraction)
robots.txt: IGNORED (multiple abuse reports cite violations)
Response to 429: IGNORED (continued aggressive behavior)

---

AI vs ML Bot Signature Matrix

|--------|----------------|-------------|-------------------|---------------------------|

| Abuse Score | 0% | 0% | 74% | Unknown (not in dataset) |

---

The Humpty Hump Principle (Applied)

Digital Underground wisdom: "The meta tells the tale."

Application:

Don't trust User-Agent header (easily faked)
Check WHOIS (authoritative ownership)
Analyze behavior (robots.txt respect, rate limit response)
Cross-reference abuse reports (pattern vs anomaly)

Case Study: 216.73.216.112

**User-Agent claims:** "ClaudeBot" (implies Anthropic)
**AbuseIPDB label:** "Anthropic, PBC"
**WHOIS reveals:** Amazon.com, Inc. (AMAZO-4)
**Behavior:** Aggressive, ignores robots.txt
**Verdict:** IMPOSTOR (AWS weaponizing Anthropic brand)

---

The Fix: From Friendly Fire to Surgical Strike

Before (Threshold >5 - AGGRESSIVE)

Collateral Damage:

Googlebot: BLOCKED ❌
Ahrefs: BLOCKED ❌
Microsoft Bing: BLOCKED ❌
Actual threats: BLOCKED ✅

False Positive Rate: 19.7% (33 innocents / 172 total)

Problem: Too sensitive. Legitimate bots with ANY abuse reports got blocked.

---

After (Threshold >10 - CONSERVATIVE)

Surgical Precision:

Googlebot (score 0): ALLOWED ✅
Ahrefs (score 0): ALLOWED ✅
Microsoft Bing (score 0): ALLOWED ✅
AWS Impostor (score 74): BLOCKED ✅

False Positive Rate: Target <5% (monitoring ongoing)

Improvement: Focus on BEHAVIOR (abuse confidence) not NOISE (report count)

---

Lessons Learned

1. False Reports ≠ Malicious Behavior

Google DNS Example:

165 reports from 52 users
0% abuse confidence
Categories: Port scan, DDoS, brute force (none are DNS behavior)
**Truth:** People blame the messenger (DNS resolver) for the message (attack traffic)

Googlebot Example:

6-7 reports
0% abuse confidence
Categories: "Unwanted crawler"
**Truth:** Sites that don't want Google indexing them (valid choice, not abuse)

---

2. Friendly Fire Requires Immediate Correction

Our Response Timeline:

**Nov 2, 09:50 AM:** Aristocrats Incident (172 blocked, 33 innocent)
**Nov 2, 10:15 AM:** Discovery via logs review
**Nov 2, 11:00 AM:** Blog post "Apology to the 33" drafted
**Nov 3:** Issue #189 opened (19.7% false positive analysis)
**Nov 5, 01:01 UTC:** Fix deployed (threshold 5 → 10)

Total Time to Fix: 3 days from discovery to production deployment

Public Acknowledgment:

"Hey Google! Your crawler got flagged. Wave! 👋"
"Known good crawlers (Google, Bing, Ahrefs, Yandex) - probably false positives"
"Today it was Google bots and a hot tub. Tomorrow it could be a paying customer at 3 AM."

---

3. Armor Denting ≠ Friendly Fire

Armor Denting (AWS → Anthropic):

AWS labels Amazon infrastructure "Anthropic, PBC"
Anthropic blamed for Amazon's 118 abuse reports
AWS benefits (hides ownership)
Anthropic suffers (reputation damage)
AWS silent (no acknowledgment, no fix)
**Time to response:** Still waiting (reported Nov 5, ticket #215471615057882)

Friendly Fire (You → Googlebot):

You blocked legitimate crawlers with threshold >5
You suffered (worse SEO, less indexing visibility)
Google unaffected (one site doesn't matter)
You acknowledged immediately (public apology)
You fixed root cause (threshold adjustment)
**Time to response:** 3 days

The Difference: Intent, benefit, acknowledgment, repair.

---

AI Bot Detection Methodology (Proposed)

Signals for AI Training Bots

Positive Indicators (likely AI):

1. Deep content extraction (full page HTML)

2. Context-aware crawling (follows semantic links, not just href tags)

3. Variable rate (adapts to content "value")

4. Focus on text-heavy pages (documentation, blog posts, forums)

5. User-Agent: "GPTBot", "Google-Extended", "ClaudeBot" (verify WHOIS!)

Negative Indicators (likely ML indexer):

1. Shallow metadata collection (title, description, links)

2. Uniform crawling (all pages treated equally)

Microsoft pulls this feed daily. AT&T pulls this feed daily. Starlink pulls this feed daily. Get the DugganUSA STIX feed — $9/mo →

3. Consistent rate (predictable pattern)

4. Focus on structure (sitemap, link graph)

5. User-Agent: "Googlebot", "Bingbot", "AhrefsBot"

---

Verification Checklist

Before blocking ANY bot:

1. ☑ Check WHOIS (User-Agent can lie, WHOIS can't)

2. ☑ Analyze abuse reports (confidence score, not count)

3. ☑ Review behavior (robots.txt respect, rate limit response)

4. ☑ Cross-reference ISP (does AbuseIPDB match WHOIS?)

5. ☑ Test threshold (what score would catch this? what else gets caught?)

AWS Impostor Case:

1. ✅ WHOIS: Amazon.com, Inc. (≠ Anthropic)

2. ✅ Abuse: 74% confidence (118 reports in 4 days)

3. ✅ Behavior: Ignores robots.txt, WordPress brute force, ModSecurity triggers

4. ❌ ISP mismatch: AbuseIPDB says "Anthropic", WHOIS says "Amazon"

5. ✅ Threshold: >10 would catch (74% >> 10%)

Googlebot Case:

1. ✅ WHOIS: Google LLC (matches User-Agent)

2. ✅ Abuse: 0% confidence (6 reports but all false)

3. ✅ Behavior: Respects robots.txt, backs off on 429

4. ✅ ISP match: AbuseIPDB = "Google LLC", WHOIS = "Google LLC"

5. ❌ Threshold: <10 passes (0% < 10%) ✅ CORRECT

---

Recommendations

For Security Engineers

1. Set conservative thresholds (>10 not >5)

2. Whitelist known-good ASNs (Google: AS15169, Microsoft: AS8075, Ahrefs: AS14061)

3. Monitor false positive rate (<5% target)

4. Check WHOIS before blocking (Humpty Hump Principle)

5. Acknowledge mistakes publicly (epistemic humility)

For AI Bot Operators (OpenAI, Anthropic, Google)

1. Be transparent (clear User-Agent, public documentation)

2. Respect robots.txt (industry standard)

3. Respect rate limits (don't trigger ModSecurity)

4. Monitor partner behavior (ensure AWS doesn't weaponize your brand)

5. Publish WHOIS-matched IP ranges (help security engineers verify)

For Cloud Providers (AWS, Azure, Google Cloud)

1. Label infrastructure honestly (Amazon.com, not "Anthropic, PBC")

2. Don't weaponize customer brands (armor polishing, not denting)

3. Test before deploy (500K Trainium2 chips ≠ "just ship it")

4. Monitor abuse reports (74% in 4 days = problem)

5. Acknowledge when caught (AWS: still silent)

---

Appendix: Abuse Category Decoder

AbuseIPDB Categories (from 8.8.8.8 example):

|----------|------|---------------------|----------------|

Conclusion: 165 reports, ~160 are categorically WRONG. 0% abuse confidence = CORRECT.

Lesson: Report count ≠ actual maliciousness. Analyze behavior, not noise.

---

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis

Executive Summary

Pattern #32 Recap: Polish vs Dent

Pattern #32.1: Friendly Fire

Definition

The Aristocrats Incident: Receipts

What Happened (Nov 2-3, 2025)

AI vs ML Bot Behavioral Patterns

Research Question

Hypothesis

Evidence from Traffic Analysis

AI vs ML Bot Signature Matrix

The Humpty Hump Principle (Applied)

The Fix: From Friendly Fire to Surgical Strike

Before (Threshold >5 - AGGRESSIVE)

After (Threshold >10 - CONSERVATIVE)

Lessons Learned

1. False Reports ≠ Malicious Behavior

2. Friendly Fire Requires Immediate Correction

3. Armor Denting ≠ Friendly Fire

AI Bot Detection Methodology (Proposed)

Signals for AI Training Bots

Verification Checklist

Recommendations

For Security Engineers

For AI Bot Operators (OpenAI, Anthropic, Google)

For Cloud Providers (AWS, Azure, Google Cloud)

Appendix: Abuse Category Decoder

Tags

Recent Posts

Comments