Reflection: Self-Critique from Traces
Goal: climb from the reflex tier to the reflective tier. The gates in Units 4–5 act in the moment on simple rules. Now the agent does something slower and harder: after a turn finishes, it reads its own trace and critiques it — producing a written, structured judgment about what went well and what to change. You will build both halves: a deterministic pass that mines the failure path from a trace, and a model pass that turns it into a structured proposed change.
Where this fits: the first reflective unit. It is the first loop that consumes the lens you built in Units 1–3 — reflection reads the trace events those units emit. Unlike a gate, it does not act in-turn; it runs after the turn and produces a proposal, not an instant block. Unit 7 then closes the loop by feeding that proposal back.
A different kind of loop
The reflex gates were narrow, fast, and rule-based. A reflective loop is the opposite: it runs after the turn, it uses the model’s judgment rather than a fixed rule, and its output is not an action but a proposal — “here is what I think should change, and why.” This is the pattern the research community calls Reflexion (Shinn et al., 2023 — an agent critiques its own trajectory in words and uses that as feedback) and Self-Refine (Madaan et al., 2023 — iterative self-feedback); Anthropic’s Building Effective Agents names the same shape the evaluator-optimizer loop. The harness implements it as the Captain’s Log: one structured reflection per turn.
It has two halves, and the first one needs no model at all.
Half one: mine the failure path (deterministic)
Before you pay a model to think, extract the facts a critique should focus on. A trace already
records what failed and what the agent did next, so a deterministic pass can pull out the failure
path: which tool calls failed, and — by reading the next event — whether the agent retried or
gave up. This mirrors personal_agent’s _extract_failure_excerpt
(Reference: examples/06/reflection.py
):
for i, ev in enumerate(events):
if ev.get("operation") == "tool_call_failed":
nxt = events[i + 1] if i + 1 < len(events) else {}
if nxt.get("operation") == "tool_call_started" and nxt.get("tool") == ev.get("tool"):
recovery = "retry"
elif nxt.get("operation") == "reply_ready":
recovery = "gave up"
else:
recovery = "other"
failed.append({"tool": ev["tool"], "error": ev["error"], "recovery": recovery})
Run against a turn where Elasticsearch timed out twice, it produces a clean excerpt — two failures,
recovery retry then gave up — with no model and no cost. Cheap before smart: the deterministic
pass focuses the expensive one.
Half two: the structured self-critique
Now the model reads the trace (and the failure excerpt) and returns a structured judgment, not
free prose. The harness asks for exactly this shape — a rationale, a proposed_change with
what / why / how, and an impact_assessment — and persists it as a CaptainLogEntry:
'Respond with ONLY JSON: {"rationale": str, '
'"proposed_change": {"what": str, "why": str, "how": str}, "impact_assessment": str}'
Structure is the point. A free-text “the agent should be better at Elasticsearch” cannot be
deduplicated, counted, or promoted; a typed proposed_change can be (Unit 8). The course example
keeps the shape minimal (what / why / how); the harness’s ProposedChange adds a category
and a scope, which become the namespace Unit 8’s dedup keys on. personal_agent generates this
with DSPy ChainOfThought "(E-008: 0% parse failures)" and falls back to manual JSON parsing if
that is unavailable — because a reflection you cannot parse
is a reflection you cannot use.
flowchart LR
TURN["turn runs<br/>(emits a trace)"] --> READ["read own trace<br/>(Units 1–3)"]
READ --> MINE["mine failures<br/>(deterministic, free)"]
READ --> CRIT["model self-critique<br/>(structured JSON)"]
MINE --> CRIT
CRIT --> PROP["proposed_change<br/>(what / why / how)"]
PROP -.->|not applied yet| NEXT["dedup + feed back<br/>(Units 8, 7)"]Keep it joinable, and keep it honest
The reflection must link back to the turn it critiques. The harness stamps a TelemetryRef
carrying the originating trace_id, so a proposal can always be traced to the evidence that
produced it — the same joinability discipline from Unit 1, now applied to the loop’s output. The
example emits a reflection_created event carrying that back-reference.
And a warning the rest of the course builds on: a single reflection is a hypothesis, not a fact. The model might misread a one-off network blip as a systemic flaw. That is exactly why this loop produces a proposal and stops — it does not act. Units 7 and 8 add the safeguards (relevance, recurrence) that decide which proposals are worth believing, and Unit 9 keeps a human in the loop for the risky ones.
Security: reflection reads the full trace — tool arguments, errors, sometimes snippets of output — and sends it to a model, so redact secrets and personal data from the trace before it becomes a prompt (Observability Standard R5). A subtler risk: the
proposed_changeis model-generated, untrusted text. A poisoned tool result earlier in the turn could steer the critique toward a harmful “improvement,” so a proposal is never auto-executed — it is data for the next loop, reviewed before it ever changes the system.
Observe: this unit emits a
reflection_createdevent with aTelemetryRefback to the critiquedtrace_id, plus whether a proposal was produced and how many failures were found. The loop it opens is “what should change?” — but note it is not closed here: reflection produces a proposal and stops. Closing it (feeding it back) is Unit 7; trusting it (dedup) is Unit 8.
Challenges
- Mine a clean turn. Run the extractor on a trace with no failures. Success: it returns
None, and you can explain why a successful turn should produce a lighter reflection (or none). - Make the critique structured. With an endpoint set, get the model to return the
rationale/proposed_change/impact_assessmentJSON, and reject a response that is not valid JSON. Success: a parsed proposal, or a clean fallback — never a crash on bad output. - Trace the proposal to its evidence. Given a
reflection_createdevent, write the query that pulls the original turn’s trace. Success: the proposal and the trace it critiques are joined bytrace_id— the loop’s output is as joinable as its input.
Recap
- The reflective tier runs after a turn, uses the model’s judgment, and produces a proposal, not an in-turn action — the Reflexion / Self-Refine / evaluator-optimizer pattern.
- It has two halves: a deterministic failure-path mine (cheap, no model) that focuses an
expensive structured self-critique (rationale +
proposed_changeof what/why/how). - Structure makes it usable — a typed proposal can be deduplicated, counted, and promoted; the
harness uses DSPy for near-zero parse failures and links each reflection to its
trace_id. - A single reflection is a hypothesis, so the loop produces a proposal and stops — the safeguards come next.
Next
Unit 7 — Closing the Reflective Loop: a reflection nobody reads changes nothing. Next you close the loop — feeding a small, relevant slice of past reflections back into the next turn’s context, so the agent actually re-reads its own observations. It is the cleanest example in the course of an agent’s output becoming its future behavior.