news

LLM Tool Calling

Also called function calling. The pattern where a LLM) is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A host (an AI Agent Harness, an SDK, or the model provider's own platform) exec

Sebastien Dubois

18 May 2026 — 3 min read

This is a note from my public notes. View the canonical version: LLM Tool Calling.

Also called function calling. The pattern where a LLM is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A host (an AI Agent Harness, an SDK, or the model provider's own platform) executes the tool and feeds the result back, letting the LLM continue reasoning. The model itself never executes anything; it only emits structured intent.

How It Works

Host registers tools with the LLM session: name, description, JSON Schema for parameters
User sends a prompt
LLM either responds with text OR with a structured tool call: { "tool": "search_web", "args": { "query": "..." } }
Host executes the tool, captures the result
Result is sent back to the LLM as a new message
LLM uses the result to produce its final response

This loop can repeat (multi-step reasoning across multiple tool calls).

Who Executes the Tool

The model emits the tool call. Something else runs it. That something varies, and the difference matters for security, observability, and portability.

1. The AI agent harness (most common today)

A harness like Claude Code, OpenCode, Cursor.com, or Aider sits between the user and the model. It receives the structured tool call, executes it locally (file system, shell, browser, Model Context Protocol (MCP) servers), and sends the result back. The harness owns auth, the execution sandbox, the audit log, and any human-in-the-loop confirmation. See AI Agent Harness for the full pattern.

2. The model provider's API or platform

Increasingly, the API/platform around the LLM executes selected tools server-side without the host ever seeing the raw call. Examples:

OpenAI's hosted tools (web search, code interpreter, file search) run inside OpenAI's infrastructure; you enable them with a flag.
Anthropic's server-side tools (web search, code execution) run in Anthropic's environment.
Anthropic's Claude Managed Agents execute the entire harness, including arbitrary tool calls, in Anthropic-managed containers.
"Configurable integrations" on hosted assistant platforms (ChatGPT connectors, Claude integrations, Gemini extensions) let users wire in third-party services that the platform invokes on the model's behalf.

This shifts execution responsibility from the host application to the model provider; the host only configures, the platform runs.

Why the distinction matters

Security boundary: harness-executed tools touch your machine; platform-executed tools touch the provider's infrastructure (and whatever they connect out to).
Latency and cost: platform-executed tools avoid round-trip to your host but charge per invocation in the API bill.
Portability: code that depends on platform-executed tools breaks when you switch providers; harness-executed tools (especially over Model Context Protocol (MCP)) port more cleanly.
Observability: you see harness-executed calls in your own logs; you only see platform-executed calls in whatever telemetry the provider exposes.
Trust: you own the harness execution; you must trust the provider for platform execution.

A real production system usually mixes both: the harness handles local actions and proprietary integrations, the platform handles general-purpose tools (search, code interpreter) where its native implementation outperforms what you'd build.

Why It Matters

Tool calling turns an LLM from a text generator into a controller. It's the foundation of:

AI agents that take actions in the world
Retrieval-augmented chatbots
LLMs that integrate with databases, APIs, file systems, browsers
The Model Context Protocol (MCP) and Claude Code's tool use

Where It Shows Up

API	Tool Calling Support
OpenAI API	Native (`tools` parameter)
Anthropic Claude API	Native (`tools` parameter)
Google Gemini API	Native
W3C Prompt API	Supported via `tools` config
Gemini Nano	Supported on-device
Local LLMs (llama.cpp, Ollama)	Varies by model fine-tuning

Design Considerations

Schema clarity: tool descriptions must be unambiguous; the LLM uses them to choose
Error handling: tools can fail; the LLM needs the error message to recover or retry
Cost: each tool call round-trip consumes tokens; chains can be expensive
Safety: tools that mutate state need authorization and logging

Relationship to Structured Outputs

Tool calling is a constrained form of LLM Structured Outputs — the model output must conform to one of the registered tool schemas, plus the choice of which tool. Many runtimes implement both via the same constrained-decoding mechanism.

References

About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

📚 KM for Beginners — 10+ hours of structured video lessons
🚀 Obsidian Starter Kit — Ready-made vault with 40+ templates
💼 Knowledge Worker Kit — Complete guides + lifetime community
🦉 1-on-1 Coaching — Personalized guidance
🎯 Join Knowii — Community + ALL courses & tools