Context Compression

Context Compression: Keeping a Long Agent Inside the Window — the third course, a hands-on, measured follow-on to the Foundations course .

Do the Foundations course first. This course assumes it — it leans on §4 (tokens and the context window), §11 (prompt caching), §13 (conversation state and history), and §23 (agents), and does not re-teach them. If those aren’t familiar, start with Foundations .

It is a sibling of the Agent Memory course, not a sequel — read them in either order. Memory is about what an agent knows across sessions; this course is about keeping one session inside the token budget without losing what it still needs: measuring the window, dropping and windowing, structured summarization, head/middle/tail preservation, a deterministic pre-pass, and a default the instrumentation earns rather than the author. The course is being written unit by unit; work through what has landed in order — each one builds a piece, decides a tradeoff, and cites the SOTA it draws on.

-1 The Context Problem Goal: understand the problem this course solves, and why it is harder than “the window is full.” A long-running agent keeps adding to the message list … 01 Measuring the Window Goal: before you can manage a budget, you have to read it. In this unit you build a context meter: a small tool that counts the tokens in your prompt … 02 The Cheapest Compression Is None Goal: learn when not to compress. Now that the meter (Unit 1) tells you how full the window is, the first question is not “how do I compress?” but … 03 Drop & Window: The Safe Baseline Goal: build the cheapest compaction that actually frees tokens. Unit 2 said do nothing while you are under budget; this unit is what to do the moment … 04 Summarizing Evicted Turns Goal: stop throwing evicted turns away. Unit 3 dropped the oldest middle turns outright; that frees tokens but deletes whatever those turns held. This … 05 Head, Middle, Tail Goal: stop compressing the turns the model is still using. Unit 3 anchored the head and dropped from the front; Unit 4 summarized what it evicted. … 06 Cheap Before Smart: The Deterministic Pre-Pass Goal: shrink the middle for free before you pay to summarize it. Unit 5 isolated the middle as the only region you compress; Unit 4 handed it to an … 07 When to Fire: Triggers & Async Compression Goal: decide when compaction runs, and get it off the critical path. Units 3–6 built the what — drop, summarize, head/middle/tail, the deterministic … 08 Offloading & Paging: Gist Memory Goal: keep a giant artifact without keeping it in the window. Some things are too big to leave in context and too important to summarize — the … 09 Cache-Aware Compaction Goal: make the prompt cache the subject, not a side note. Every unit so far has treated the cache as a warning — Unit 2 named it, Unit 4 showed … 10 Prompt-Level Compression Goal: compress inside the text, not just at the level of whole messages. Every mechanism so far has worked on messages — keep one, drop one, summarize … 11 Measuring Compression Quality Goal: turn the through-line into a tool. Every unit since Unit 1 has emitted a joinable record — a meter reading, a compaction with its before/after … 12 The Measured Default Goal: assemble the whole course into one defensible default. You have built every branch of the decision tree from Unit 0 — measuring, doing nothing, …

Last modified June 20, 2026: Mount Context Compression as a docs section (fix Mermaid render on site) (#38) (7cfedc1)