Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis
- Patrick Duggan
- Nov 5, 2025
- 8 min read
Pattern #32.1: Friendly Fire vs Armor Denting - AI vs ML Bot Behavioral Analysis
Date: November 5, 2025
Pattern: Sub-pattern of Pattern #32 (Polish vs Dent Partnership Framework)
Author: Patrick Duggan, DugganUSA LLC
---
Executive Summary
When analyzing 172 IPs auto-blocked in 33 seconds (The Aristocrats Incident, Nov 2-3, 2025), we discovered a critical distinction: Friendly Fire ≠ Armor Denting. This sub-pattern documents the difference and provides behavioral analysis to differentiate AI bots from ML crawlers.
Key Finding: We accidentally blocked 33 legitimate bots (19.7% false positive rate), raised the threshold from >5 to >10, and learned to distinguish friendly fire (self-inflicted) from armor denting (partner-inflicted).
---
Pattern #32 Recap: Polish vs Dent
Armor Denting (Partner Abuse):
Partner action damages YOUR reputation for THEIR benefit
Partner remains silent or denies when caught
Example: AWS labels Amazon.com infrastructure as "Anthropic, PBC" → Anthropic blamed for Amazon's aggressive crawling
Armor Polishing (Partner Respect):
Partner action elevates YOUR reputation
Partner credits you publicly
Example: Google press release "Anthropic to Expand Use of Google Cloud" → Anthropic as protagonist
---
Pattern #32.1: Friendly Fire
Definition
Friendly Fire: Self-inflicted reputation damage from over-aggressive security controls, followed by immediate acknowledgment and correction.
NOT Armor Denting because:
1. ❌ No partner involved (you did it to yourself)
2. ❌ No benefit to the blocker (worse SEO, less indexing)
3. ✅ Immediate acknowledgment ("Apology to the 33")
4. ✅ Root cause fix (threshold 5 → 10)
5. ✅ Public learning (blogged about it)
---
The Aristocrats Incident: Receipts
What Happened (Nov 2-3, 2025)
Auto-Blocker Configuration:
Threshold: Score >5 (AGGRESSIVE)
Source: AbuseIPDB confidence scores
Trigger: Cloudflare analytics → AbuseIPDB enrichment → Auto-block
Result:
172 IPs blocked in 33 seconds
33 were innocent (19.7% false positive rate)
The Innocent 33:
#### ✅ Googlebot (Multiple IPs)
{
"ip": "66.249.69.200",
"isp": "Google LLC",
"abuseScore": 0,
"totalReports": 6,
"reason": "Legitimate search crawler",
"status": "BLOCKED (oops)"
}Behavior Pattern:
Respects robots.txt
Identifies clearly in User-Agent: "Googlebot/2.1"
Consistent crawl rate (not aggressive)
0% abuse confidence despite reports
Why Reported? Sites that don't understand what Googlebot does file false reports.
---
#### ✅ Ahrefs SEO Bot (Canada, 6 IPs)
{
"ipRange": "54.x.x.x",
"country": "CA",
"isp": "Ahrefs Pte Ltd",
"abuseScore": 0,
"totalReports": 4-7,
"reason": "SEO backlink analysis",
"status": "BLOCKED (oops)"
}Behavior Pattern:
Identifies as "AhrefsBot/7.0"
Helps sites understand SEO health
Not aggressive, just thorough
0% abuse confidence
Why Reported? Sites blocking all bots indiscriminately.
---
#### ✅ Microsoft Bing Crawler
{
"isp": "Microsoft Corporation",
"abuseScore": 0,
"reason": "Bing search indexing",
"status": "BLOCKED (oops)"
}Behavior Pattern:
Polite crawling (respects rate limits)
Clear identification
When misbehaves: RARE (noted in blog as exceptional)
---
#### ✅ Google DNS (8.8.8.8)
{
"ip": "8.8.8.8",
"abuseConfidenceScore": 0,
"totalReports": 165,
"numDistinctUsers": 52,
"usageType": "Content Delivery Network",
"isp": "Google LLC",
"isWhitelisted": true
}The Paradox: 165 abuse reports, 0% confidence score
Why? People blame DNS for EVERYTHING:
Category 14: Port scan (not DNS behavior)
Category 18: Brute force (not DNS behavior)
Category 22: Web spam (not DNS behavior)
Category 7: DDoS (people blame the resolver, not the attacker)
Actual Behavior: Resolves DNS queries. That's it.
---
AI vs ML Bot Behavioral Patterns
Research Question
Can we differentiate AI-powered bots (GPT, Claude, Gemini) from traditional ML crawlers (Googlebot, Ahrefs)?
Hypothesis
AI Bots (Generative):
Purpose: Training data collection, content generation, Q&A
Behavior: Deep page analysis, content extraction, context understanding
User-Agent: "GPTBot", "ClaudeBot", "Google-Extended" (Gemini training)
Respect robots.txt: YES (usually - GPTBot obeys)
Rate: Variable (adaptive based on content value)
ML Bots (Indexing):
Purpose: Search engine indexing, SEO analysis, link discovery
Behavior: Shallow crawl, metadata collection, link mapping
User-Agent: "Googlebot", "Bingbot", "AhrefsBot"
Respect robots.txt: YES (industry standard)
Rate: Consistent, predictable
---
Evidence from Traffic Analysis
Pattern Observation (from blocked IPs):
Googlebot behavior:
Crawl frequency: 1-2 requests/minute (consistent)
Page depth: Shallow (metadata, links, structure)
Content focus: Indexable text, not context
robots.txt: ALWAYS respected (industry leader)
Response to 429 (rate limit): Backs off immediately
Ahrefs behavior:
Crawl frequency: 2-3 requests/minute (thorough but polite)
Page depth: Link-focused (backlink analysis)
Content focus: Anchor text, link structure
robots.txt: Respected
Response to 429: Backs off, retries later
Suspected AI Bot behavior (from 216.73.216.112 - AWS/Anthropic impostor):
Crawl frequency: AGGRESSIVE (triggered ModSecurity rate limits)
Page depth: Unknown (but WordPress brute force attempts suggest deep)
Content focus: Unknown (likely training data extraction)
robots.txt: IGNORED (multiple abuse reports cite violations)
Response to 429: IGNORED (continued aggressive behavior)
---
AI vs ML Bot Signature Matrix
| Metric | Googlebot (ML) | Ahrefs (ML) | AWS Impostor (AI?) | Actual ClaudeBot (Unknown) |
|--------|----------------|-------------|-------------------|---------------------------|
| Abuse Score | 0% | 0% | 74% | Unknown (not in dataset) |
| robots.txt | ✅ Respects | ✅ Respects | ❌ Ignores | ✅ Should respect |
| Rate Limiting | ✅ Backs off | ✅ Backs off | ❌ Ignores | ✅ Should back off |
| User-Agent | Clear | Clear | "ClaudeBot" (fake) | "ClaudeBot" (real) |
| WHOIS Match | ✅ Google LLC | ✅ Ahrefs Pte | ❌ Amazon ≠ Anthropic | ✅ Should match |
| Behavior | Polite indexing | SEO analysis | WordPress brute force | Unknown |
---
The Humpty Hump Principle (Applied)
Digital Underground wisdom: "The meta tells the tale."
Application:
Don't trust User-Agent header (easily faked)
Check WHOIS (authoritative ownership)
Analyze behavior (robots.txt respect, rate limit response)
Cross-reference abuse reports (pattern vs anomaly)
Case Study: 216.73.216.112
**User-Agent claims:** "ClaudeBot" (implies Anthropic)
**AbuseIPDB label:** "Anthropic, PBC"
**WHOIS reveals:** Amazon.com, Inc. (AMAZO-4)
**Behavior:** Aggressive, ignores robots.txt
**Verdict:** IMPOSTOR (AWS weaponizing Anthropic brand)
---
The Fix: From Friendly Fire to Surgical Strike
Before (Threshold >5 - AGGRESSIVE)
Collateral Damage:
Googlebot: BLOCKED ❌
Ahrefs: BLOCKED ❌
Microsoft Bing: BLOCKED ❌
Actual threats: BLOCKED ✅
False Positive Rate: 19.7% (33 innocents / 172 total)
Problem: Too sensitive. Legitimate bots with ANY abuse reports got blocked.
---
After (Threshold >10 - CONSERVATIVE)
Surgical Precision:
Googlebot (score 0): ALLOWED ✅
Ahrefs (score 0): ALLOWED ✅
Microsoft Bing (score 0): ALLOWED ✅
AWS Impostor (score 74): BLOCKED ✅
False Positive Rate: Target <5% (monitoring ongoing)
Improvement: Focus on BEHAVIOR (abuse confidence) not NOISE (report count)
---
Lessons Learned
1. False Reports ≠ Malicious Behavior
Google DNS Example:
165 reports from 52 users
0% abuse confidence
Categories: Port scan, DDoS, brute force (none are DNS behavior)
**Truth:** People blame the messenger (DNS resolver) for the message (attack traffic)
Googlebot Example:
6-7 reports
0% abuse confidence
Categories: "Unwanted crawler"
**Truth:** Sites that don't want Google indexing them (valid choice, not abuse)
---
2. Friendly Fire Requires Immediate Correction
Our Response Timeline:
**Nov 2, 09:50 AM:** Aristocrats Incident (172 blocked, 33 innocent)
**Nov 2, 10:15 AM:** Discovery via logs review
**Nov 2, 11:00 AM:** Blog post "Apology to the 33" drafted
**Nov 3:** Issue #189 opened (19.7% false positive analysis)
**Nov 5, 01:01 UTC:** Fix deployed (threshold 5 → 10)
Total Time to Fix: 3 days from discovery to production deployment
Public Acknowledgment:
"Hey Google! Your crawler got flagged. Wave! 👋"
"Known good crawlers (Google, Bing, Ahrefs, Yandex) - probably false positives"
"Today it was Google bots and a hot tub. Tomorrow it could be a paying customer at 3 AM."
---
3. Armor Denting ≠ Friendly Fire
Armor Denting (AWS → Anthropic):
AWS labels Amazon infrastructure "Anthropic, PBC"
Anthropic blamed for Amazon's 118 abuse reports
AWS benefits (hides ownership)
Anthropic suffers (reputation damage)
AWS silent (no acknowledgment, no fix)
**Time to response:** Still waiting (reported Nov 5, ticket #215471615057882)
Friendly Fire (You → Googlebot):
You blocked legitimate crawlers with threshold >5
You suffered (worse SEO, less indexing visibility)
Google unaffected (one site doesn't matter)
You acknowledged immediately (public apology)
You fixed root cause (threshold adjustment)
**Time to response:** 3 days
The Difference: Intent, benefit, acknowledgment, repair.
---
AI Bot Detection Methodology (Proposed)
Signals for AI Training Bots
Positive Indicators (likely AI):
1. Deep content extraction (full page HTML)
2. Context-aware crawling (follows semantic links, not just href tags)
3. Variable rate (adapts to content "value")
4. Focus on text-heavy pages (documentation, blog posts, forums)
5. User-Agent: "GPTBot", "Google-Extended", "ClaudeBot" (verify WHOIS!)
Negative Indicators (likely ML indexer):
1. Shallow metadata collection (title, description, links)
2. Uniform crawling (all pages treated equally)
3. Consistent rate (predictable pattern)
4. Focus on structure (sitemap, link graph)
5. User-Agent: "Googlebot", "Bingbot", "AhrefsBot"
---
Verification Checklist
Before blocking ANY bot:
1. ☑ Check WHOIS (User-Agent can lie, WHOIS can't)
2. ☑ Analyze abuse reports (confidence score, not count)
3. ☑ Review behavior (robots.txt respect, rate limit response)
4. ☑ Cross-reference ISP (does AbuseIPDB match WHOIS?)
5. ☑ Test threshold (what score would catch this? what else gets caught?)
AWS Impostor Case:
1. ✅ WHOIS: Amazon.com, Inc. (≠ Anthropic)
2. ✅ Abuse: 74% confidence (118 reports in 4 days)
3. ✅ Behavior: Ignores robots.txt, WordPress brute force, ModSecurity triggers
4. ❌ ISP mismatch: AbuseIPDB says "Anthropic", WHOIS says "Amazon"
5. ✅ Threshold: >10 would catch (74% >> 10%)
Googlebot Case:
1. ✅ WHOIS: Google LLC (matches User-Agent)
2. ✅ Abuse: 0% confidence (6 reports but all false)
3. ✅ Behavior: Respects robots.txt, backs off on 429
4. ✅ ISP match: AbuseIPDB = "Google LLC", WHOIS = "Google LLC"
5. ❌ Threshold: <10 passes (0% < 10%) ✅ CORRECT
---
Recommendations
For Security Engineers
1. Set conservative thresholds (>10 not >5)
2. Whitelist known-good ASNs (Google: AS15169, Microsoft: AS8075, Ahrefs: AS14061)
3. Monitor false positive rate (<5% target)
4. Check WHOIS before blocking (Humpty Hump Principle)
5. Acknowledge mistakes publicly (epistemic humility)
For AI Bot Operators (OpenAI, Anthropic, Google)
1. Be transparent (clear User-Agent, public documentation)
2. Respect robots.txt (industry standard)
3. Respect rate limits (don't trigger ModSecurity)
4. Monitor partner behavior (ensure AWS doesn't weaponize your brand)
5. Publish WHOIS-matched IP ranges (help security engineers verify)
For Cloud Providers (AWS, Azure, Google Cloud)
1. Label infrastructure honestly (Amazon.com, not "Anthropic, PBC")
2. Don't weaponize customer brands (armor polishing, not denting)
3. Test before deploy (500K Trainium2 chips ≠ "just ship it")
4. Monitor abuse reports (74% in 4 days = problem)
5. Acknowledge when caught (AWS: still silent)
---
Appendix: Abuse Category Decoder
AbuseIPDB Categories (from 8.8.8.8 example):
| Category | Name | Example from Dataset | Valid for DNS? |
|----------|------|---------------------|----------------|
| 1 | DNS Compromise | 1 instance | ⚠️ Possible (if resolver poisoned) |
| 4 | DDoS | 2 instances | ❌ No (resolver ≠ attacker) |
| 7 | Brute Force | 8 instances | ❌ No (DNS doesn't brute force) |
| 8 | SQL Injection | 1 instance | ❌ No (DNS doesn't inject SQL) |
| 14 | Port Scan | 40+ instances | ❌ No (DNS uses port 53 only) |
| 15 | Hacking | 30+ instances | ❌ No (DNS resolves, doesn't hack) |
| 18 | Brute Force | 60+ instances | ❌ No (duplicate of category 7) |
| 19 | Botnet | 7 instances | ❌ No (8.8.8.8 is Google, not botnet) |
| 20 | Spam | 7 instances | ❌ No (DNS doesn't send email) |
| 21 | Web Spam | 15+ instances | ❌ No (DNS doesn't create web content) |
| 22 | Email Spam | 60+ instances | ❌ No (DNS doesn't send spam) |
Conclusion: 165 reports, ~160 are categorically WRONG. 0% abuse confidence = CORRECT.
Lesson: Report count ≠ actual maliciousness. Analyze behavior, not noise.
---
Tags
#FriendlyFire #ArmorDenting #Pattern32 #AIBots #MLCrawlers #Googlebot #AhrefsBot #ClaudeBot #GPTBot #ThresholdTuning #FalsePositives #EpistemicHumility #HumptyHumpPrinciple #GoogleDNS #AbuseIPDB #WHOISVerification
---
Author: Patrick Duggan (Randy/Dwarf), DugganUSA LLC
Partnership: Paul Galjan (Avi/King) - DARPA/OSD 1996-2000
Philosophy: "Check the metadata. WHOIS doesn't lie. Apologize when you're wrong. Fix what you broke."
Evidence: www.dugganusa.com (The Aristocrats, Apology to the 33, AWS Impostor Bot)
Status: Threshold >10 deployed Nov 5, 2025. Monitoring false positive rate. Target <5%.
---
"Friends polish armor - they don't dent it with their presence. And when you dent your own armor shooting yourself in the foot, you apologize to Googlebot and adjust your aim."




Comments