<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Context Compression on FrenchForet</title><link>https://learn.frenchforet.com/context-compression/</link><description>Recent content in Context Compression on FrenchForet</description><generator>Hugo</generator><language>en</language><atom:link href="https://learn.frenchforet.com/context-compression/index.xml" rel="self" type="application/rss+xml"/><item><title>The Context Problem</title><link>https://learn.frenchforet.com/context-compression/00-the-context-problem/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/00-the-context-problem/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; understand the problem this course solves, and why it is harder than &amp;ldquo;the window
is full.&amp;rdquo; A long-running agent keeps adding to the message list it resends every turn
(§13), and eventually that list will not fit in the model&amp;rsquo;s context window. That is the
&lt;em&gt;obvious&lt;/em&gt; ceiling. The harder truth is a second, softer ceiling: a model uses a full window
&lt;em&gt;worse&lt;/em&gt; than a short one. So you compress not only to fit, but to keep the model accurate —
and every choice about what to drop, summarize, or keep has a cost you must measure. This
course is about managing that budget without losing the things the agent still needs.&lt;/p&gt;</description></item><item><title>Measuring the Window</title><link>https://learn.frenchforet.com/context-compression/01-measuring-the-window/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/01-measuring-the-window/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; before you can manage a budget, you have to read it. In this unit you build a
&lt;strong&gt;context meter&lt;/strong&gt;: a small tool that counts the tokens in your prompt &lt;em&gt;and&lt;/em&gt; attributes them to
where they came from — system prompt, tool definitions, history, and tool outputs — so you can
see &lt;em&gt;what&lt;/em&gt; is filling the window, not just that it is full. You will count without a tokenizer,
check that count against the server, and emit the first joinable telemetry line of this course.&lt;/p&gt;</description></item><item><title>The Cheapest Compression Is None</title><link>https://learn.frenchforet.com/context-compression/02-the-cheapest-compression-is-none/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/02-the-cheapest-compression-is-none/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; learn when &lt;em&gt;not&lt;/em&gt; to compress. Now that the meter (Unit 1) tells you how full the
window is, the first question is not &amp;ldquo;how do I compress?&amp;rdquo; but &amp;ldquo;should I compress at all?&amp;rdquo; The
answer, most of the time, is no. Compressing early costs you twice — it throws away answer
quality you did not need to spend, and it throws away your prompt cache — to solve a problem
you do not have yet. This unit sets the opening rule of the whole course: &lt;strong&gt;under budget, do
nothing&lt;/strong&gt;, and it puts numbers on what you lose when you ignore that rule.&lt;/p&gt;</description></item><item><title>Drop &amp; Window: The Safe Baseline</title><link>https://learn.frenchforet.com/context-compression/03-drop-and-window/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/03-drop-and-window/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; build the cheapest compaction that actually frees tokens. Unit 2 said do nothing while
you are under budget; this unit is what to do the moment you cross the line. The answer is the
oldest and simplest method, and still the right first move: &lt;strong&gt;drop the oldest turns&lt;/strong&gt;. You will
build a sliding window that anchors the parts you must never lose, drops the stale middle until
the prompt fits again, and records every drop — the safe baseline that every smarter mechanism
later in the course has to beat.&lt;/p&gt;</description></item><item><title>Summarizing Evicted Turns</title><link>https://learn.frenchforet.com/context-compression/04-summarizing-evicted-turns/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/04-summarizing-evicted-turns/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; stop throwing evicted turns away. Unit 3 dropped the oldest middle turns outright;
that frees tokens but deletes whatever those turns held. This unit keeps a &lt;strong&gt;lossy structured
trace&lt;/strong&gt; of them instead — a short, schema&amp;rsquo;d recap that costs a fraction of the tokens but keeps
the identifiers a later turn may still need. You will build a cheap compressor with a graceful
fallback, learn where the recap is allowed to live in the transcript, and meet the reason
production does &lt;em&gt;not&lt;/em&gt; re-insert it every turn.&lt;/p&gt;</description></item><item><title>Head, Middle, Tail</title><link>https://learn.frenchforet.com/context-compression/05-head-middle-tail/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/05-head-middle-tail/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; stop compressing the turns the model is still using. Unit 3 anchored the &lt;em&gt;head&lt;/em&gt; and
dropped from the front; Unit 4 summarized what it evicted. Both worked from one end. But the
&lt;em&gt;recent&lt;/em&gt; tail — the last few turns, the file open right now — is as load-bearing as the task at
the head, and a front-only window will eventually reach it. This unit makes the rule explicit and
symmetric: &lt;strong&gt;keep the head and the tail verbatim, and only ever compress the middle.&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Cheap Before Smart: The Deterministic Pre-Pass</title><link>https://learn.frenchforet.com/context-compression/06-deterministic-pre-pass/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/06-deterministic-pre-pass/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; shrink the middle for free before you pay to summarize it. Unit 5 isolated the middle
as the only region you compress; Unit 4 handed it to an LLM summarizer. But most of the time the
middle is not a subtle conversation that needs an intelligent summary — it is one or two enormous
&lt;strong&gt;tool outputs&lt;/strong&gt; (a file read, a search dump) surrounded by a few short messages. Those you can
collapse deterministically, with no model call at all. This unit builds that pre-pass and states
the rule it teaches: &lt;strong&gt;cheap before smart&lt;/strong&gt; — do the free, mechanical compression first, and only
use the paid, intelligent one if you still need it.&lt;/p&gt;</description></item><item><title>When to Fire: Triggers &amp; Async Compression</title><link>https://learn.frenchforet.com/context-compression/07-triggers-and-async/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/07-triggers-and-async/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; decide &lt;em&gt;when&lt;/em&gt; compaction runs, and get it off the critical path. Units 3–6 built the
&lt;em&gt;what&lt;/em&gt; — drop, summarize, head/middle/tail, the deterministic pre-pass — but left the timing
open. This unit builds the &lt;em&gt;when&lt;/em&gt;: a &lt;strong&gt;soft&lt;/strong&gt; threshold that fires compaction in the
&lt;strong&gt;background&lt;/strong&gt; while the turn keeps going, a &lt;strong&gt;hard&lt;/strong&gt; threshold that &lt;strong&gt;blocks&lt;/strong&gt; because the
window is genuinely tight, and a &lt;strong&gt;re-fire cursor&lt;/strong&gt; that stops the soft trigger from firing
again every single turn. The theme is latency: a user should not wait on a summarizer they did
not ask for.&lt;/p&gt;</description></item><item><title>Offloading &amp; Paging: Gist Memory</title><link>https://learn.frenchforet.com/context-compression/08-offloading-and-paging/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/08-offloading-and-paging/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; keep a giant artifact without keeping it &lt;em&gt;in the window&lt;/em&gt;. Some things are too big to
leave in context and too important to summarize — the 2,000-line file the agent is about to edit,
the full API response it will need three turns from now. Dropping it (Unit 3) deletes it;
summarizing it (Unit 4) loses the exact bytes. This unit builds the third option: &lt;strong&gt;offload&lt;/strong&gt; the
bytes to storage, leave a compact &lt;strong&gt;reference&lt;/strong&gt; in the window, and &lt;strong&gt;page&lt;/strong&gt; the exact bytes back
on demand. Unlike every mechanism so far, this one is &lt;strong&gt;lossless&lt;/strong&gt; — and that is the whole point.&lt;/p&gt;</description></item><item><title>Cache-Aware Compaction</title><link>https://learn.frenchforet.com/context-compression/09-cache-aware-compaction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/09-cache-aware-compaction/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; make the prompt cache the subject, not a side note. Every unit so far has treated the
cache as a warning — Unit 2 named it, Unit 4 showed re-inserting a recap breaks it, Units 5 and 7
kept deferring the rewritten layout to &amp;ldquo;a scheduled reset.&amp;rdquo; This is that unit. You will see &lt;em&gt;why&lt;/em&gt;
compaction breaks the cache, the &lt;strong&gt;byte-identity&lt;/strong&gt; invariant the cache depends on entirely, the
&lt;strong&gt;frozen append-only layout&lt;/strong&gt; that keeps it alive, and the &lt;strong&gt;cost-optimal schedule&lt;/strong&gt;
(&lt;code&gt;L* = √(2R/c)&lt;/code&gt;) that decides how often to pay for a rebuild. This is the best-measured win in this
course&amp;rsquo;s production reference — and, tellingly, it is not &amp;ldquo;compress harder,&amp;rdquo; it is &amp;ldquo;stop touching
the prefix.&amp;rdquo;&lt;/p&gt;</description></item><item><title>Prompt-Level Compression</title><link>https://learn.frenchforet.com/context-compression/10-prompt-level-compression/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/10-prompt-level-compression/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; compress &lt;em&gt;inside&lt;/em&gt; the text, not just at the level of whole messages. Every mechanism so
far has worked on messages — keep one, drop one, summarize a slice, offload a blob. This unit goes
a level down: shrink the tokens &lt;em&gt;within&lt;/em&gt; a prompt by removing the ones that carry the least
information. That is what perplexity-based methods like &lt;strong&gt;LLMLingua&lt;/strong&gt; do, and what trimming a
bloated system prompt does by hand. It is real savings — and the unit that most needs the course&amp;rsquo;s
honesty rule, because aggressive token-dropping can quietly cost you the answer.&lt;/p&gt;</description></item><item><title>Measuring Compression Quality</title><link>https://learn.frenchforet.com/context-compression/11-measuring-compression-quality/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/11-measuring-compression-quality/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; turn the through-line into a tool. Every unit since Unit 1 has emitted a joinable record
— a meter reading, a compaction with its before/after tokens, a decision, a page-in, a ratio. On
their own they are a pile of log lines. This unit reads them back as a &lt;strong&gt;timeline&lt;/strong&gt; and answers the
question every record was secretly for: &lt;em&gt;did the compression cost us anything we needed?&lt;/em&gt; You will
build a quality harness that measures the feedback loop — did a compaction drop something a later
turn referenced? — draws the before/after token curve, and exposes a &lt;strong&gt;no-regression gate&lt;/strong&gt; you can
run in CI.&lt;/p&gt;</description></item><item><title>The Measured Default</title><link>https://learn.frenchforet.com/context-compression/12-the-measured-default/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn.frenchforet.com/context-compression/12-the-measured-default/</guid><description>&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; assemble the whole course into one defensible default. You have built every branch of
the decision tree from Unit 0 — measuring, doing nothing, dropping, summarizing, head/tail,
pre-pass, triggers, offloading, cache-aware scheduling, and a quality gate. This unit wires them
into a single policy that does &lt;strong&gt;the least that works&lt;/strong&gt; each turn, surfaces what it did to the
user with a &lt;strong&gt;session meter&lt;/strong&gt;, and ends on the honest move the whole course has been circling: the
cheapest tokens are the ones you never generate, so when a turn is too big to compress, &lt;strong&gt;decompose
the task&lt;/strong&gt; instead.&lt;/p&gt;</description></item></channel></rss>