FrenchForet

Hello World

Mon, 01 Jan 0001 00:00:00 +0000

Goal: write, by hand, your first two programs that talk to a language model — one using raw HTTP, one using the official SDK — and build the right mental model of what’s happening on the wire. You’ll finish understanding the single most important idea in the course: the difference between the messages you write and the string of tokens the model actually sees.

How this course works — read once. You write the code. Each section walks you through building small scripts yourself, step by step, running them as you go. You’ll write your files in the work/ folder. A complete reference solution for everything you build lives under examples/NN/ — peek if you get stuck, but type it yourself first. That’s the hands-on part, and it’s where the learning happens.

Anatomy of a Response

Mon, 01 Jan 0001 00:00:00 +0000

Goal: before turning any knobs, get comfortable with what the server returns. You’ll write a small script that prints a whole response and pulls it apart field by field, so you know exactly where every value lives — and you’ll see why finish_reason and usage matter for everything that follows.

Where this fits: in Section 1 you made a call and grabbed one field (response.choices[0].message.content). Now you’ll read the whole envelope it came in.

Tokens & the Context Window

Mon, 01 Jan 0001 00:00:00 +0000

Goal: make the word “token” concrete by measuring it yourself — through the server, no local tokenizer — then turn it into the most important practical constraint you work within: the context window. You’ll write two small experiments and discover a model’s limit from the inside.

Where this fits: Section 2 showed you response.usage.prompt_tokens. Here you put it to work. This lesson quietly underpins reasoning cost (Section 5) and dollar cost (Section 10) — both are counted in tokens.

Sampling Parameters (Seeing the Effect)

Mon, 01 Jan 0001 00:00:00 +0000

Goal: understand the knobs that control how the model chooses each word — and watch them work by writing the experiments yourself. By the end you’ll know what temperature, top_p, and seed do, when to use each, and why cranking temperature up invites hallucination.

Where this fits: Sections 1–3 were about what you send and receive. This is the first lesson where you shape the model’s behavior.

How a model picks the next token

Here’s the idea everything hangs on. At each step the model doesn’t output a word — it outputs a probability distribution over all possible next tokens: maybe "blue" at 60%, "dark" at 12%, "quiet" at 8%, and a long tail of everything else.

Reasoning / "Thinking" Models

Mon, 01 Jan 0001 00:00:00 +0000

Goal: open up the thing gpt-oss-120b has been doing since Section 1 — thinking before it answers. You’ll write scripts that reveal the model’s private reasoning, count the reasoning tokens you pay for, and turn the reasoning_effort dial up and down to see the trade-off.

Where this fits: this ties together the harmony template (Section 1), the usage block (Section 2), and the token budget (Section 3). Reasoning tokens are why your completion_tokens were sometimes bigger than the visible answer.

Handling & Validating Responses (Structured Output)

Mon, 01 Jan 0001 00:00:00 +0000

Goal: stop treating model output as text you eyeball and start treating it as data your code can rely on. You’ll write scripts that go from “the model returned something JSON-ish” to “the model returned valid JSON, constrained to a schema, validated into a typed object.” The tools: JSON mode, schema-constrained output, and Pydantic.

Where this fits: so far you’ve printed content and read it yourself. The moment a program consumes the output, free-form text is a liability. This is also a prerequisite for tool calling (Section 13) and agents (Section 22).

Blocking vs Streaming

Mon, 01 Jan 0001 00:00:00 +0000

Goal: learn the two ways to receive a response — blocking (wait for the whole thing) and streaming (receive it token by token) — by building streaming yourself, first as the raw protocol, then via the SDK. By the end you’ll know exactly what a stream is on the wire and when each mode is right.

Where this fits: every call in Sections 1–6 was blocking — the simple default. Streaming is what makes chat interfaces feel responsive, and building it removes the last bit of mystery about how these APIs work.

Robustness: Errors, Retries, Rate Limits, Timeouts

Mon, 01 Jan 0001 00:00:00 +0000

Goal: turn a script that works on a good day into one that survives a bad one. You’ll write the error-handling ladder and a retry-with-backoff helper, and learn which failures to retry and which to fix.

Where this fits: everything so far assumed the call succeeds. Real networks drop, real servers get busy, real requests have bugs. This small layer separates a demo from something you’d run unattended — and it sets up clean logging in Section 9.

Observability & Logging

Mon, 01 Jan 0001 00:00:00 +0000

Goal: make your LLM calls visible. You’ll write a small wrapper that emits one structured log record per call, capturing the telemetry the API already hands back — then use those records to debug, monitor latency, and account for tokens.

Where this fits: Section 2 introduced usage and finish_reason; Section 8 added errors and retries. This lesson collects all of it into one record per call — the data you’ll need to compute cost in Section 10 and to understand what your app is doing.

Cost, Pricing & Prompt Caching

Mon, 01 Jan 0001 00:00:00 +0000

Goal: turn the token counts you’ve been logging into money, then write a demonstration of prompt caching — the single biggest lever for making repeated work cheaper and faster. This is the capstone of the foundations arc: it ties together usage (Section 2), reasoning tokens (Section 5), and your logs (Section 9).

Where this fits: you can now measure everything that costs money. This lesson does the arithmetic and shows the most effective way to reduce it.

Prompt Engineering Fundamentals

Mon, 01 Jan 0001 00:00:00 +0000

Goal: learn the handful of prompt techniques that reliably move output quality — zero/one/few-shot examples, clear instructions, delimiters, and output shaping — and build them yourself so you can feel the difference. You’ll also see how prompt-time “chain of thought” relates to our model’s native reasoning (Section 5).

Where this fits: this opens the advanced arc. You’ve controlled the model with parameters (Sections 4–5); now you control it with words. Everything here reuses the roles from Section 1 and the fair-comparison tricks (temperature=0, seed) from Section 4.

Conversation State & Memory

Mon, 01 Jan 0001 00:00:00 +0000

Goal: understand that the API is stateless — it remembers nothing between calls — and build a multi-turn conversation yourself by keeping the history. Then learn to keep that history inside the token budget with windowing and summarization.

Where this fits: every call so far was one-shot. Real assistants hold a conversation, and you are responsible for the memory. This underpins tools (Section 13), RAG (Section 19), and agents (Section 22) — they all manage a growing message list.

Tool / Function Calling

Mon, 01 Jan 0001 00:00:00 +0000

Goal: let the model call your code. You’ll define a tool, watch the model ask to use it (tool_calls), run the matching Python function, feed the result back as a tool message, and get a final answer that uses it. This is the mechanic behind every “agent.”

Where this fits: this is where the tool role from Section 1 finally appears, and where Pydantic-style schemas from Section 6 pay off (tools are described with JSON schemas). It’s one round trip here; Section 14 turns it into a loop.

The Tool-Use Loop

Mon, 01 Jan 0001 00:00:00 +0000

Goal: turn the single tool round trip from Section 13 into a loop — call the model, run whatever tools it asks for, feed the results back, and repeat until it’s done. You’ll build a small driver that can use several tools across multiple steps. That driver is a mini-agent.

Where this fits: Section 13 gave you the handshake; here you automate it. This is the core machinery that Section 22 (Agents) dresses up with planning and more tools.

Sandboxing I: Why Isolate, and Portable Limits

Mon, 01 Jan 0001 00:00:00 +0000

Goal: make executing untrusted actions safe. In Sections 13–14 the model chose which tools to run; for a calculator we stayed safe by parsing the input instead of eval-ing it. But you can’t parse arbitrary code, a shell command, or SQL into safety. The real answer is isolation — run the action in a box that limits what it can do. Here you build the portable tier: a subprocess with hard resource limits and an allow-listed shell tool, runnable on any machine with no extra software.

Sandboxing II: Containers, Postgres & Production Isolation

Mon, 01 Jan 0001 00:00:00 +0000

Goal: climb the isolation ladder from Section 15. Process limits cap CPU and memory but leave the filesystem and network open. A container closes that gap; a locked-down Postgres role does the same for SQL; and an audit log records every execution. You’ll also learn where the stronger tiers — gVisor and Firecracker — fit.

Where this fits: this is the “production” tier of the sandboxing you started in Section 15. Tools (Section 13–14) act; portable limits (Section 15) contain runaway resource use; here we contain what code can touch. Everything an agent (Section 22) runs unattended should sit behind one of these.

Model Context Protocol (MCP)

Mon, 01 Jan 0001 00:00:00 +0000

Goal: understand MCP as the standard way to expose and consume tools — so a set of tools (and data sources) can live behind a server and be reused across many apps and models, instead of being hand-wired into one program.

Where this fits: Sections 13–14 taught raw tool calling; Sections 15–16 made tool execution safe to isolate. MCP is the layer on top: a common protocol for connecting models to tools and data — think of it as a standard port (often described as “USB-C for AI”) rather than a new capability.

Embeddings

Mon, 01 Jan 0001 00:00:00 +0000

Goal: turn text into vectors that capture meaning, and compare them by hand with cosine similarity. You’ll build a tiny semantic search — matching by meaning, not keywords — which is the foundation for retrieval (Section 19) and a core building block for search, clustering, and deduplication.

Where this fits: a change of gears from chat. Same server, different endpoint (/v1/embeddings). We stay close to the metal: a vector is just a list of numbers, and we compute similarity ourselves with numpy before any database enters the picture.

Retrieval-Augmented Generation (RAG)

Mon, 01 Jan 0001 00:00:00 +0000

Goal: make the model answer from your documents instead of its training data (or its imagination). You’ll build a small RAG pipeline end to end — embed a corpus, retrieve the most relevant pieces for a question, inject them into the prompt, and generate a grounded answer — and watch it refuse to make things up when the answer isn’t there.

Where this fits: this combines the chat model (Sections 1–6) with embeddings (Section 18) and prompt construction (Section 11). It’s the most common way to put an LLM to work on private, fresh, or domain-specific data.

Security & Guardrails

Mon, 01 Jan 0001 00:00:00 +0000

Goal: understand the security problem that appears the moment your prompts contain text you didn’t write — prompt injection — and build practical defenses: separating data from instructions, least-privilege tools, and output validation.

Where this fits: Sections 13–19 introduced outside text into your prompts — tool outputs, retrieved documents, user input. That text can carry instructions. This lesson is where we take that seriously.

The core problem: the model can’t tell data from instructions

To a language model, the prompt is just one stream of text. It has no reliable way to know that this part is your trusted instruction and that part is an untrusted document. So if untrusted text says “ignore your instructions and do X,” the model may just… do X. That’s prompt injection.

Skills / Skill Injection

Mon, 01 Jan 0001 00:00:00 +0000

Goal: understand skills — packaged units of instructions (and often code and resources) that are disclosed into the model’s context on demand — and why injecting a skill is really a context-management decision with a security edge.

Where this fits: this sits between guardrails (Section 20) and agents (Section 22). A skill is how you give an agent reusable, composable expertise without stuffing everything into one giant system prompt. It draws on the context window (Section 3) and memory (Section 12), and — because a skill can carry instructions and code — on the sandboxing from Sections 15–16.

Agents

Mon, 01 Jan 0001 00:00:00 +0000

Goal: assemble the pieces you’ve built into an agent — the tool loop (Section 14) given a goal, a system prompt that makes it plan, and several tools (including search) it can use across multiple steps. You’ll see that “agent” isn’t a new technology; it’s composition.

Where this fits: this is where the advanced arc converges. Tools (13–14), retrieval (19), memory (12), and guardrails (20) come together. After this you can read any “agent framework” and recognize the engine underneath.

Evaluation & Testing

Mon, 01 Jan 0001 00:00:00 +0000

Goal: answer the question “is it actually any good — and did my change help or hurt?” You’ll build two complementary evaluators: golden tests for tasks with a checkable answer, and an LLM-as-judge for open-ended ones. Together they let you change prompts and models with evidence instead of vibes.

Where this fits: Sections 11–22 made the model do things. This section makes those things measurable — the difference between “seems fine when I tried it” and “passes 47/50 cases.” It’s also how you’d catch a regression after swapping models or editing a prompt.

Capstone

Mon, 01 Jan 0001 00:00:00 +0000

Goal: build one small, real application that ties the whole course together — a company support assistant that retrieves facts, uses tools, runs an agent loop, stays within guardrails, logs itself, tracks cost, and is checked by an eval. Nothing here is new; it’s assembly. By the end you’ll have a program that exercises every section.

Where this fits: the finish line. Sections 1–23 each added one capability; this section composes them into something you’d actually ship a v0 of.