Tool / Function Calling
Goal: let the model call your code. You’ll define a tool, watch the model ask to
use it (tool_calls), run the matching Python function, feed the result back as a tool
message, and get a final answer that uses it. This is the mechanic behind every “agent.”
Where this fits: this is where the tool role from Section 1 finally appears, and
where Pydantic-style schemas from Section 6 pay off (tools are described with JSON
schemas). It’s one round trip here; Section 14 turns it into a loop.
Your endpoint must have tool calling enabled (vLLM’s automatic tool choice with a
gpt-osstool parser). Iftool_callscomes back empty in the examples, that’s the likely cause — note it and read on; the mechanics are the same everywhere.
The idea
A model can’t run code or look things up — but it can tell you it wants to. Tool calling is a structured handshake:
- You send the question plus a list of tools (each a name + JSON-schema parameters).
- The model either answers normally, or replies with a
tool_callsrequest (finish_reason == "tool_calls") naming a tool and arguments. - You run the real function and send the result back as a
toolmessage. - The model answers using your result.
The model never runs anything. It only ever asks; you stay in control of what actually executes.
Build one round trip
We’ll give the model a calculator. First, the real function — note we use a safe
arithmetic evaluator, not eval (Section 20 explains why that distinction matters once
the model is choosing the input). Create work/tool_call.py:
import ast, json, operator
from common import get_client, MODEL
client = get_client()
_OPS = {ast.Add: operator.add, ast.Sub: operator.sub, ast.Mult: operator.mul,
ast.Div: operator.truediv, ast.Pow: operator.pow, ast.USub: operator.neg}
def calculate(expression: str) -> str:
"""Safely evaluate arithmetic: parse to AST, allow only numbers + operators."""
def ev(n):
if isinstance(n, ast.Constant) and isinstance(n.value, (int, float)):
return n.value
if isinstance(n, ast.BinOp) and type(n.op) in _OPS:
return _OPS[type(n.op)](ev(n.left), ev(n.right))
if isinstance(n, ast.UnaryOp) and type(n.op) in _OPS:
return _OPS[type(n.op)](ev(n.operand))
raise ValueError("unsupported")
return str(ev(ast.parse(expression, mode="eval").body))
Now describe the tool to the model — this is the JSON schema, just like Section 6:
tools = [{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a basic arithmetic expression and return the number.",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string",
"description": "e.g. '2 * (3 + 4)'"}},
"required": ["expression"],
},
},
}]
Send the question with the tools and see what comes back:
messages = [{"role": "user", "content": "What is 23 * 17 + 5? Use the calculator."}]
first = client.chat.completions.create(
model=MODEL, messages=messages, tools=tools, tool_choice="auto",
)
msg = first.choices[0].message
print("finish_reason:", first.choices[0].finish_reason) # 'tool_calls'
print("tool_calls:", msg.tool_calls)
Run it:
python work/tool_call.py
The reply isn’t an answer — it’s a request. Each item in
msg.tool_calls has an id, the function.name, and function.arguments (a JSON
string, not a dict). Now complete the handshake — append the assistant’s request, run
each tool, append a tool result for each, and ask again:
messages.append({
"role": "assistant",
"content": msg.content,
"tool_calls": [tc.model_dump() for tc in msg.tool_calls],
})
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments) # arguments are a JSON string
result = calculate(**args)
messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
second = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
print("final answer:", second.choices[0].message.content)
Run again — now you get a sentence with the correct number. (Reference:
examples/13/tool_call.py
.)
The message sequence is the whole point
The conversation you build must contain, in order:
user "What is 23 * 17 + 5?"
assistant (content maybe empty) + tool_calls=[{id: "call_1", name: "calculate", ...}]
tool tool_call_id="call_1" content="396"
assistant "23 * 17 + 5 = 396."
Two rules people trip on:
- You must append the assistant message that carried the
tool_callsbefore thetoolresults. Thetoolmessage is a reply to a specific request and is matched bytool_call_id. function.argumentsis a string of JSON — alwaysjson.loadsit before use, and validate it (Section 6) since the model chose it.
Generate schemas from Pydantic. Instead of hand-writing the
parametersschema, define aBaseModeland useModel.model_json_schema()(Section 6) as the tool’sparameters. One definition gives you the schema and a validator for the arguments.
Security: Tool arguments are chosen by the model, so treat them as untrusted: validate every one, and never
eval/execthem. Running model-picked input is exactly what the sandboxing in Sections 15–16 makes safe.
Challenges
- A second tool. Add a
get_length(text: str)tool that returnslen(text). Ask a question that needs it (“How many characters in ‘hello world’?”). Success: the model calls the right tool. - Validate the arguments. Wrap the parsed
argsin a Pydantic model before callingcalculate. Success: a malformed tool call is rejected by your code, not crashed on. - Make it refuse. Ask a question that needs no tool (“Who wrote Hamlet?”).
Success:
tool_callsis empty and the model answers directly.
Recap
- Tool calling is a handshake: you send
tools; the model replies withtool_calls; you run the function and return atoolmessage; the model answers using it. - The model only asks — your code decides what runs.
- Build the message sequence correctly: append the assistant’s
tool_callsmessage, then onetoolmessage per call (matched bytool_call_id). function.argumentsis a JSON string —json.loadsand validate it (Section 6).
Next
Section 14 — The Tool-Use Loop: one round trip becomes a loop — call, run tools, feed results, repeat until the model is done. That loop is a mini-agent.