When was the last time you ran /context while using Claude Code?
If you haven't, it's worth doing once. It changes how you see the tool.
System prompts and tools already take about 9.8% of the window. Claude Code's autocompaction buffer holds back another 22.5%. Before writing a single line of code, roughly 67% is left for the actual work.
That's the baseline. It's smaller than most people think.
Then I checked again after we had added our own infrastructure: CLAUDE.md, rules, commands, supporting docs, the whole careful stack of things meant to make the tool more predictable.
Under 50%.
I hadn't started writing code yet.
when the cost became visible
The clearest signal was not failure. It was drag.
I would be midway through outlining a change in the conversation: the structure, the order, the decisions I was about to make. Then the session would compact. The summary was usually good enough, but some of the working detail would fall out. I found myself re-anchoring ideas I thought were still live, repeating assumptions, reviewing more output than I expected to need.
Nothing dramatic broke. The workflow just got heavier.
I could feel it in my body before I had a clean explanation for it. I was drinking coffee faster, shoulders a little tighter, pushing the session forward with that quiet urgency that usually means part of my attention is going to overhead instead of the problem itself.
That was the useful realization.
If you don't treat context as infrastructure, AI-assisted development quietly taxes working memory. And that cost shows up not just as slower output, but as a worse afternoon.
how we got there
Like a lot of early Claude Code users, we kept adding structure.
We wrote rules. We documented patterns. We explained conventions. We tried to reduce ambiguity by being explicit, which felt responsible while we were doing it. And to be clear, some of that structure was absolutely worth it. The problem was not that the guidance existed. The problem was that too much of it lived in the always-loaded layer.
Over time, the stack got heavy. CLAUDE.md, rules, commands, and supporting files grew past 3,400 lines. Some of that defined real boundaries. Some of it was just workflow detail dressed up as if it needed to survive every session forever.
The intent was clarity. The side effect was context pressure.
Once I saw that clearly, the question changed from "how do we document this better?" to "what actually deserves to stay loaded all the time?"
the hot and cold split
What helped most was separating two kinds of guidance we had been treating as if they were the same thing.
Constraints are hot. They define properties that must keep holding no matter what: data access patterns, theming boundaries, TypeScript strictness, testing expectations, architectural lines you do not want the model to cross just because the conversation compacted.
Procedures are cold. They describe how to do something: how to create a component, how to structure a certain kind of test, what checklist to follow for a feature. Useful, yes. But they do not need to occupy the always-loaded part of the context window when they are irrelevant to the work in front of you.
That distinction sounds obvious in retrospect. It did not feel obvious while we were building the system.
One of the mistakes we made early was treating component creation steps as if they were constraints. Folder structure, setup sequences, templates, examples, "how to" sections. None of that needed to survive compaction. What mattered was that the model respected data boundaries, theming, and testing patterns. The rest could be loaded on demand.
Once that clicked, the cleanup became much easier.
what moved, what stayed
We pulled out:
- step-by-step setup instructions
- examples and templates
- workflow procedures
- "how to" guidance that only matters in specific situations
We kept:
- core architectural constraints
- non-negotiable patterns
- explicit anti-patterns
- links to where the procedural detail now lives
That procedural detail moved into skills.
The hot layer got smaller. The cold layer got richer.
why skills fixed the shape of the problem
Skills turned out to be the right home for the workflow material because they behave like progressive disclosure.
When a skill is invoked, the procedure loads. When it isn't needed, it costs nothing.
That changed the shape of the system. Instead of carrying every process all the time, Claude Code only had to carry the boundaries all the time. The rest could arrive when the work actually called for it.
That did introduce a different tradeoff: explicit invocation. If a skill is not referenced, the model does not have that procedural detail. Early on, that created a different kind of friction. Output would respect the constraints but miss some project-specific workflow because the relevant skill never got loaded.
I still prefer that failure mode.
It is much easier to notice "the procedure wasn't loaded" than to work inside a permanently crowded context window and wonder why the whole session feels slightly less steady than it should.
what changed in the numbers
Once we restructured the guidance, we measured the difference directly.
Always-loaded rules dropped from roughly 12k tokens to 2k. About an 83% reduction.
Before restructuring, the always-loaded portion looked roughly like this:
| Component | Tokens |
|---|---|
| CLAUDE.md | ~340 |
| Rules (6 files) | ~3,500 |
| Total | ~3,840 |
After separating constraints from procedures, the files condensed hard:
| File | Before | After | Reduction |
|---|---|---|---|
| CLAUDE.md | ~1.5k | ~400 | -73% |
| architecture.md | ~1.6k | ~300 | -81% |
| react-patterns.md | ~1.2k | ~250 | -79% |
| typescript-rules.md | ~886 | ~200 | -77% |
| testing-standards.md | ~999 | ~150 | -85% |
| security-guidelines.md | ~1.2k | ~300 | -75% |
| styling-standards.md | ~4.6k | ~400 | -91% |
| Total | ~12k | ~2k | ~83% |
The important part was not that the system got "smarter." It didn't.
It got calmer.
Autocompaction interrupted the work less. Output stayed aligned longer. Review overhead dropped. And the context meter stopped feeling like a low-grade threat sitting in the corner of the session.
That change mattered more than I expected.
what we gave up
Nothing disappears for free.
Skills do not automatically reload after compaction. If a session compacts mid-task, the procedural layer may need to be invoked again. Teams also need to know which skills exist and when to use them. That is real overhead. It is just a much better kind of overhead than keeping every possible procedure loaded at all times.
Some enforcement moved out of context entirely. Formatting, linting, and related test execution now run through hooks. That was another useful realization. Certain kinds of discipline do not belong in prompt memory at all. They belong in automation.
Once we moved those responsibilities to hooks, the context window had more room for the things only context can do well: planning, judgment, and reasoning.
if you're feeling this too
Run /context.
Find your biggest always-loaded file and ask a blunt question: is this defining a boundary, or is it describing a process?
If it defines a boundary that must survive compaction, keep it hot.
If it describes a process that only matters sometimes, move it cold.
That one distinction did more for the quality of our workflow than any amount of additional documentation.
FAQ
What is Claude Code context management and why does it matter?
Claude Code has a finite context window shared between system prompts, tools, and your actual work. System prompts and tools consume about 9.8%, autocompaction reserves 22.5%, leaving roughly 67% before you write any code. If your project's always-loaded rules consume too much of that remaining space, output becomes less predictable and autocompaction disrupts flow more often.
What is the hot/cold separation pattern for Claude Code configuration?
Hot context is the always-loaded layer: architectural boundaries, non-negotiable patterns, and constraints that must survive compaction. Cold context is procedural guidance: workflows, templates, examples, and step-by-step instructions that can be loaded on demand through skills.
How do Claude Code skills work as progressive disclosure?
Skills let procedural guidance load only when it's relevant. That keeps the always-loaded footprint small while still making detailed workflows available when the work calls for them.
How much context can you save by restructuring CLAUDE.md and rules?
In this case, always-loaded rules dropped from roughly 12,000 tokens to 2,000 tokens, about an 83% reduction, by moving step-by-step procedures, examples, and templates into skills.
What should stay always loaded and what should move to skills?
Keep boundaries hot: architecture, non-negotiable patterns, and explicit anti-patterns. Move procedures cold: setup steps, templates, examples, and how-to guidance that only matters in specific workflows.
closing thought
AI-assisted development makes context a shared resource.
Once I started treating it that way, the tradeoffs got much easier to see.
Reducing always-loaded context did not make the tool more magical. It just made the workflow more predictable. Less bracing for the next compaction. Less carrying extra things in my head just in case. More room for the part of the work that actually feels like engineering.
Day to day, that was the win.