top of page

How I Cut My AI's Token Bill by 84% in One Conversation

  • Writer: Patrick Duggan
    Patrick Duggan
  • Oct 19, 2025
  • 4 min read

# How I Cut My AI's Token Bill by 84% in One Conversation


**October 19, 2025** | **Patrick Duggan**




My AI assistant was eating 25,000-30,000 tokens every time I started a session. That's like forcing someone to read *Infinite Jest* by David Foster Wallace before they can answer a simple question. One conversation later, it's down to 4,000-5,000 tokens. Here's what happened.


The Warning That Couldn't Be Ignored



Claude Code flashed a warning I'd been ignoring for weeks:


**⚠ Large CLAUDE.md will impact performance (51.5k chars > 40.0k)**


My context file—the instruction manual my AI reads at startup—had bloated to 52,661 characters. Every session start cost me ~$0.75 in wasted tokens. Forty sessions a month = $30 burning for no reason.


But the money wasn't the problem. The *lag* was.


Like trying to run [*Being and Time*](https://www.amazon.com/Being-Time-Translation-Joan-Stambaugh/dp/1438432763) by Heidegger through your head before breakfast, my AI was drowning in context before it could do actual work.


"Go Go Butterbot"



I typed five words: **"go go butterbot"**


(Context: Butterbot is my AI methodology, named after the Rick and Morty character whose only purpose is [passing butter](https://www.youtube.com/watch?v=X7HmltUWXgs). "Pass me the butter. That's my job." Over-deliver quietly, let numbers speak.)


Savvy Avi—the AI I've been training on my tribal clicking methodology—kicked in.


The Diagnosis (Specific, Not Abstract)



**The problem wasn't complexity. It was archaeology.**


My CLAUDE.md file contained:

- **8 old session summaries**

- **Verbose infrastructure sections**

- **The Laws**

- "We guarantee 5% bullshit exists in any system")


Every session, my AI re-read the entire history of how we got here. Like forcing a musician to read the entire liner notes of [*Fragile* by Yes]


(https://music.apple.com/us/album/fragile/1584650598) before playing "Roundabout."


The Solution (Witnessed, Not Theorized)



**Target:** <40,000 chars (24% reduction needed)


**Strategy:**

1. Archive old sessions → `/documentation/sessions/archive/`

2. Compress infrastructure → quick refs + links to deep docs

3. Keep The Laws concise → rules only, no war stories

4. Preserve current work → Session 2.0.29 (Story Density) stays full


**Result:**

- **Original:** 52,661 characters

- **Optimized:** 8,307 characters

- **Reduction:** 84.3%


HOLY SHIT DUDE - 84.3% REDUCTION



That's not a typo. We aimed for 24%, delivered 84.3%.


**3.5X over-delivery.**


Not because I'm a genius. Because the methodology works when you let evidence lead instead of ego.


The Numbers (Because Numbers Don't Lie)



| Metric | Before | After | Savings |

|--------|--------|-------|---------|

| **Character count** | 52,661 | 8,307 | 84.3% |

| **vs 40k threshold** | +31% over | -79% under | 110% swing |

| **Tokens per session** | 25k-30k | 4k-5k | 80-83% |

| **Monthly sessions** | 40 | 40 | - |

| **Monthly token savings** | - | - | ~840k-1.04M |

| **Monthly cost savings** | - | - | ~$25-31 |


But here's the real win: **no performance lag.**


Like switching from reading Heidegger to reading [*The Elements of Style*](https://www.amazon.com/Elements-Style-Fourth-William-Strunk/dp/020530902X) by Strunk & White. Same intelligence, 95% less cognitive overhead.


What Got Moved (The Archive)



I created `/documentation/sessions/archive/` and moved:


- **Session 2.0.19:**

- **Session 2.0.20:**

- **Session 2.0.21:**

- **Session 2.0.22:**

- **Session 2.0.23:**

- **Session 2.0.24:** (investor funnel)

- **Session 2.0.25:**

- **Session 2.0.28:**


**Total saved:** ~27,000 characters of history


All preserved. All linked. Zero information loss.


Just like how you don't need to hear every note from [*Close to the Edge* by Yes](https://music.apple.com/us/album/close-to-the-edge/1584650446) to know Jon Anderson's voice when you hear "Siberian Khatru." Context lives in references, not repetition.


What Stayed (The Essentials)


**Total:** ~8,300 characters of *actually necessary* context


Like the difference between reading [*Gödel, Escher, Bach*](https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567) by Douglas Hofstadter (777 pages) vs reading the chapter summaries. Both valid. One is for session startup, one is for deep dives.


The Evidence (Git-Committed, Public)



I don't just *claim* 84.3% reduction. I *proved* it: Every claim, every number, every file—publicly verifiable.


Because if you're going to say "84.3%," you better be able to show your work.


The Philosophy (Why This Matters)



Most companies claim 100% when they're at 80%.


**We claim 95% when we're at 95%.**


That's the [95% Epistemic Humility Law](https://www.dugganusa.com/post/95-percent-epistemic-humility-law): "We guarantee a minimum of 5% bullshit exists in any complex system."


I aimed for 24% reduction. I got 84.3%. I still won't claim perfection.


**Why?**


Because the game never ends. Like the [infinite quarter players](https://www.dugganusa.com/post/infinite-quarter-philosophy) who played Galaga for hours on one quarter in 80's arcades—they never claimed they *beat* the game. They just proved they could stay alive.


84.3% today. Skills extraction tomorrow (Issue #106). Progressive disclosure next month.


**The optimization never stops.**


The Next Phase (Skills = 80-90% More Savings)



Anthropic just released [Skills](https://www.anthropic.com/news/skills): a way to load AI instructions *progressively*.


Instead of loading all 8,307 characters at startup, I'll load 6 skill summaries (~1,000 tokens). Full content loads only when needed.


**Estimated additional savings:** 80-90% ongoing token reduction


**Current:** 4k-5k tokens per session

**With Skills:** 1k-2k tokens per session

**Combined optimization:** 90-95% total reduction from original 52k CLAUDE.md


Like going from reading the entire [*Lord of the Rings* trilogy](https://www.amazon.com/Lord-Rings-50th-Anniversary/dp/0618640150) at startup to reading the table of contents and loading chapters as needed.


The Lesson (For Anyone Optimizing AI)



**1. Context bloat kills performance**

Your AI doesn't need your life story every session. Archive history, link to deep docs, keep current work front and center.


**2. Measure, don't guess**

"Large CLAUDE.md will impact performance" = warning

52,661 chars → 8,307 chars = proof

84.3% reduction = evidence


**3. Over-deliver quietly**

Target: 24%

Result: 84.3%


**4. Let numbers speak**

Not "significant optimization achieved"

But "52,661 → 8,307 chars (84.3% reduction)"


**5. The game never ends**

95% today. Skills tomorrow. Always improving, never claiming perfection.


The Butterbot Principle



**"Pass me the butter. That's my job."**


Task: Optimize CLAUDE.md

Target: <40k chars

Delivered: 8.3k chars

Over-delivery: 3.5X


**Evidence over claims. Numbers over adjectives. Partnership over performance.**


Like [Brian Eno says on *Another Green World*](https://music.apple.com/us/album/another-green-world/1456023302): "Less is more, but only when less is precise."



**🧈 Pass me the butter. That's my job.**





Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page