How I Cut My AI's Token Bill by 84% in One Conversation
- Patrick Duggan
- Oct 19, 2025
- 4 min read
# How I Cut My AI's Token Bill by 84% in One Conversation
**October 19, 2025** | **Patrick Duggan**
My AI assistant was eating 25,000-30,000 tokens every time I started a session. That's like forcing someone to read *Infinite Jest* by David Foster Wallace before they can answer a simple question. One conversation later, it's down to 4,000-5,000 tokens. Here's what happened.
The Warning That Couldn't Be Ignored
Claude Code flashed a warning I'd been ignoring for weeks:
**⚠ Large CLAUDE.md will impact performance (51.5k chars > 40.0k)**
My context file—the instruction manual my AI reads at startup—had bloated to 52,661 characters. Every session start cost me ~$0.75 in wasted tokens. Forty sessions a month = $30 burning for no reason.
But the money wasn't the problem. The *lag* was.
Like trying to run [*Being and Time*](https://www.amazon.com/Being-Time-Translation-Joan-Stambaugh/dp/1438432763) by Heidegger through your head before breakfast, my AI was drowning in context before it could do actual work.
"Go Go Butterbot"
I typed five words: **"go go butterbot"**
(Context: Butterbot is my AI methodology, named after the Rick and Morty character whose only purpose is [passing butter](https://www.youtube.com/watch?v=X7HmltUWXgs). "Pass me the butter. That's my job." Over-deliver quietly, let numbers speak.)
Savvy Avi—the AI I've been training on my tribal clicking methodology—kicked in.
The Diagnosis (Specific, Not Abstract)
**The problem wasn't complexity. It was archaeology.**
My CLAUDE.md file contained:
- **8 old session summaries**
- **Verbose infrastructure sections**
- **The Laws**
- "We guarantee 5% bullshit exists in any system")
Every session, my AI re-read the entire history of how we got here. Like forcing a musician to read the entire liner notes of [*Fragile* by Yes]
(https://music.apple.com/us/album/fragile/1584650598) before playing "Roundabout."
The Solution (Witnessed, Not Theorized)
**Target:** <40,000 chars (24% reduction needed)
**Strategy:**
1. Archive old sessions → `/documentation/sessions/archive/`
2. Compress infrastructure → quick refs + links to deep docs
3. Keep The Laws concise → rules only, no war stories
4. Preserve current work → Session 2.0.29 (Story Density) stays full
**Result:**
- **Original:** 52,661 characters
- **Optimized:** 8,307 characters
- **Reduction:** 84.3%
HOLY SHIT DUDE - 84.3% REDUCTION
That's not a typo. We aimed for 24%, delivered 84.3%.
**3.5X over-delivery.**
Not because I'm a genius. Because the methodology works when you let evidence lead instead of ego.
The Numbers (Because Numbers Don't Lie)
| Metric | Before | After | Savings |
|--------|--------|-------|---------|
| **Character count** | 52,661 | 8,307 | 84.3% |
| **vs 40k threshold** | +31% over | -79% under | 110% swing |
| **Tokens per session** | 25k-30k | 4k-5k | 80-83% |
| **Monthly sessions** | 40 | 40 | - |
| **Monthly token savings** | - | - | ~840k-1.04M |
| **Monthly cost savings** | - | - | ~$25-31 |
But here's the real win: **no performance lag.**
Like switching from reading Heidegger to reading [*The Elements of Style*](https://www.amazon.com/Elements-Style-Fourth-William-Strunk/dp/020530902X) by Strunk & White. Same intelligence, 95% less cognitive overhead.
What Got Moved (The Archive)
I created `/documentation/sessions/archive/` and moved:
- **Session 2.0.19:**
- **Session 2.0.20:**
- **Session 2.0.21:**
- **Session 2.0.22:**
- **Session 2.0.23:**
- **Session 2.0.24:** (investor funnel)
- **Session 2.0.25:**
- **Session 2.0.28:**
**Total saved:** ~27,000 characters of history
All preserved. All linked. Zero information loss.
Just like how you don't need to hear every note from [*Close to the Edge* by Yes](https://music.apple.com/us/album/close-to-the-edge/1584650446) to know Jon Anderson's voice when you hear "Siberian Khatru." Context lives in references, not repetition.
What Stayed (The Essentials)
**Total:** ~8,300 characters of *actually necessary* context
Like the difference between reading [*Gödel, Escher, Bach*](https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567) by Douglas Hofstadter (777 pages) vs reading the chapter summaries. Both valid. One is for session startup, one is for deep dives.
The Evidence (Git-Committed, Public)
I don't just *claim* 84.3% reduction. I *proved* it: Every claim, every number, every file—publicly verifiable.
Because if you're going to say "84.3%," you better be able to show your work.
The Philosophy (Why This Matters)
Most companies claim 100% when they're at 80%.
**We claim 95% when we're at 95%.**
That's the [95% Epistemic Humility Law](https://www.dugganusa.com/post/95-percent-epistemic-humility-law): "We guarantee a minimum of 5% bullshit exists in any complex system."
I aimed for 24% reduction. I got 84.3%. I still won't claim perfection.
**Why?**
Because the game never ends. Like the [infinite quarter players](https://www.dugganusa.com/post/infinite-quarter-philosophy) who played Galaga for hours on one quarter in 80's arcades—they never claimed they *beat* the game. They just proved they could stay alive.
84.3% today. Skills extraction tomorrow (Issue #106). Progressive disclosure next month.
**The optimization never stops.**
The Next Phase (Skills = 80-90% More Savings)
Anthropic just released [Skills](https://www.anthropic.com/news/skills): a way to load AI instructions *progressively*.
Instead of loading all 8,307 characters at startup, I'll load 6 skill summaries (~1,000 tokens). Full content loads only when needed.
**Estimated additional savings:** 80-90% ongoing token reduction
**Current:** 4k-5k tokens per session
**With Skills:** 1k-2k tokens per session
**Combined optimization:** 90-95% total reduction from original 52k CLAUDE.md
Like going from reading the entire [*Lord of the Rings* trilogy](https://www.amazon.com/Lord-Rings-50th-Anniversary/dp/0618640150) at startup to reading the table of contents and loading chapters as needed.
The Lesson (For Anyone Optimizing AI)
**1. Context bloat kills performance**
Your AI doesn't need your life story every session. Archive history, link to deep docs, keep current work front and center.
**2. Measure, don't guess**
"Large CLAUDE.md will impact performance" = warning
52,661 chars → 8,307 chars = proof
84.3% reduction = evidence
**3. Over-deliver quietly**
Target: 24%
Result: 84.3%
**4. Let numbers speak**
Not "significant optimization achieved"
But "52,661 → 8,307 chars (84.3% reduction)"
**5. The game never ends**
95% today. Skills tomorrow. Always improving, never claiming perfection.
The Butterbot Principle
**"Pass me the butter. That's my job."**
Task: Optimize CLAUDE.md
Target: <40k chars
Delivered: 8.3k chars
Over-delivery: 3.5X
**Evidence over claims. Numbers over adjectives. Partnership over performance.**
Like [Brian Eno says on *Another Green World*](https://music.apple.com/us/album/another-green-world/1456023302): "Less is more, but only when less is precise."
**🧈 Pass me the butter. That's my job.**



Comments