Context Compression
Context Compression: Keeping a Long Agent Inside the Window — the third course, a hands-on,
measured follow-on to the Foundations course
.
Do the Foundations course first. This course assumes it — it leans on §4 (tokens and the
context window), §11 (prompt caching), §13 (conversation state and history), and §23 (agents),
and does not re-teach them. If those aren’t familiar, start with Foundations
.
It is a sibling of the Agent Memory
course, not a sequel — read them in either
order. Memory is about what an agent knows across sessions; this course is about keeping one
session inside the token budget without losing what it still needs: measuring the window,
dropping and windowing, structured summarization, head/middle/tail preservation, a deterministic
pre-pass, and a default the instrumentation earns rather than the author. The course is being
written unit by unit; work through what has landed in order — each one builds a piece, decides a
tradeoff, and cites the SOTA it draws on.
-1
The Context Problem
Goal: understand the problem this course solves, and why it is harder than “the window is full.” A long-running agent keeps adding to the message list …
01
Measuring the Window
Goal: before you can manage a budget, you have to read it. In this unit you build a context meter: a small tool that counts the tokens in your prompt …
02
The Cheapest Compression Is None
Goal: learn when not to compress. Now that the meter (Unit 1) tells you how full the window is, the first question is not “how do I compress?” but …
03
Drop & Window: The Safe Baseline
Goal: build the cheapest compaction that actually frees tokens. Unit 2 said do nothing while you are under budget; this unit is what to do the moment …
04
Summarizing Evicted Turns
Goal: stop throwing evicted turns away. Unit 3 dropped the oldest middle turns outright; that frees tokens but deletes whatever those turns held. This …
05
Head, Middle, Tail
Goal: stop compressing the turns the model is still using. Unit 3 anchored the head and dropped from the front; Unit 4 summarized what it evicted. …
06
Cheap Before Smart: The Deterministic Pre-Pass
Goal: shrink the middle for free before you pay to summarize it. Unit 5 isolated the middle as the only region you compress; Unit 4 handed it to an …
07
When to Fire: Triggers & Async Compression
Goal: decide when compaction runs, and get it off the critical path. Units 3–6 built the what — drop, summarize, head/middle/tail, the deterministic …
08
Offloading & Paging: Gist Memory
Goal: keep a giant artifact without keeping it in the window. Some things are too big to leave in context and too important to summarize — the …
09
Cache-Aware Compaction
Goal: make the prompt cache the subject, not a side note. Every unit so far has treated the cache as a warning — Unit 2 named it, Unit 4 showed …
10
Prompt-Level Compression
Goal: compress inside the text, not just at the level of whole messages. Every mechanism so far has worked on messages — keep one, drop one, summarize …
11
Measuring Compression Quality
Goal: turn the through-line into a tool. Every unit since Unit 1 has emitted a joinable record — a meter reading, a compaction with its before/after …
12
The Measured Default
Goal: assemble the whole course into one defensible default. You have built every branch of the decision tree from Unit 0 — measuring, doing nothing, …