LLM Tool Calling

Also called function calling. The pattern where a LLM) is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A host (an AI Agent Harness, an SDK, or the model provider's own platform) exec

This is a note from my public notes. View the canonical version: LLM Tool Calling.

Also called function calling. The pattern where a LLM is given a list of available functions/tools (with names, descriptions, and parameter schemas), and the model decides when to invoke them and what arguments to pass. A host (an AI Agent Harness, an SDK, or the model provider's own platform) executes the tool and feeds the result back, letting the LLM continue reasoning. The model itself never executes anything; it only emits structured intent.

How It Works

  1. Host registers tools with the LLM session: name, description, JSON Schema for parameters
  2. User sends a prompt
  3. LLM either responds with text OR with a structured tool call: { "tool": "search_web", "args": { "query": "..." } }
  4. Host executes the tool, captures the result
  5. Result is sent back to the LLM as a new message
  6. LLM uses the result to produce its final response

This loop can repeat (multi-step reasoning across multiple tool calls).

Who Executes the Tool

The model emits the tool call. Something else runs it. That something varies, and the difference matters for security, observability, and portability.

1. The AI agent harness (most common today)

A harness like Claude Code, OpenCode, Cursor.com, or Aider sits between the user and the model. It receives the structured tool call, executes it locally (file system, shell, browser, Model Context Protocol (MCP) servers), and sends the result back. The harness owns auth, the execution sandbox, the audit log, and any human-in-the-loop confirmation. See AI Agent Harness for the full pattern.

2. The model provider's API or platform

Increasingly, the API/platform around the LLM executes selected tools server-side without the host ever seeing the raw call. Examples:

  • OpenAI's hosted tools (web search, code interpreter, file search) run inside OpenAI's infrastructure; you enable them with a flag.
  • Anthropic's server-side tools (web search, code execution) run in Anthropic's environment.
  • Anthropic's Claude Managed Agents execute the entire harness, including arbitrary tool calls, in Anthropic-managed containers.
  • "Configurable integrations" on hosted assistant platforms (ChatGPT connectors, Claude integrations, Gemini extensions) let users wire in third-party services that the platform invokes on the model's behalf.

This shifts execution responsibility from the host application to the model provider; the host only configures, the platform runs.

Why the distinction matters

  • Security boundary: harness-executed tools touch your machine; platform-executed tools touch the provider's infrastructure (and whatever they connect out to).
  • Latency and cost: platform-executed tools avoid round-trip to your host but charge per invocation in the API bill.
  • Portability: code that depends on platform-executed tools breaks when you switch providers; harness-executed tools (especially over Model Context Protocol (MCP)) port more cleanly.
  • Observability: you see harness-executed calls in your own logs; you only see platform-executed calls in whatever telemetry the provider exposes.
  • Trust: you own the harness execution; you must trust the provider for platform execution.

A real production system usually mixes both: the harness handles local actions and proprietary integrations, the platform handles general-purpose tools (search, code interpreter) where its native implementation outperforms what you'd build.

Why It Matters

Tool calling turns an LLM from a text generator into a controller. It's the foundation of:

Where It Shows Up

API Tool Calling Support
OpenAI API Native (tools parameter)
Anthropic Claude API Native (tools parameter)
Google Gemini API Native
W3C Prompt API Supported via tools config
Gemini Nano Supported on-device
Local LLMs (llama.cpp, Ollama) Varies by model fine-tuning

Design Considerations

  • Schema clarity: tool descriptions must be unambiguous; the LLM uses them to choose
  • Error handling: tools can fail; the LLM needs the error message to recover or retry
  • Cost: each tool call round-trip consumes tokens; chains can be expensive
  • Safety: tools that mutate state need authorization and logging

Relationship to Structured Outputs

Tool calling is a constrained form of LLM Structured Outputs — the model output must conform to one of the registered tool schemas, plus the choice of which tool. Many runtimes implement both via the same constrained-decoding mechanism.

References


About Sébastien

I'm Sébastien Dubois, and I'm on a mission to help knowledge workers escape information overload. After 20+ years in IT and seeing too many brilliant minds drowning in digital chaos, I've decided to help people build systems that actually work. Through the Knowii Community, my courses, products & services and my Website/Newsletter, I share practical and battle-tested systems.

I write about Knowledge Work, Personal Knowledge Management, Note-taking, Lifelong Learning, Personal Organization, Productivity, and more. I also craft lovely digital products and tools.

If you want to follow my work, then become a member and join our community.

Ready to get to the next level?

If you're tired of information overwhelm and ready to build a reliable knowledge system:

Found this valuable? Share it with someone who needs it.

Join 6,000+ readers. Get practical systems for knowledge & AI. Free.

Subscribe ✨

Free: Knowledge System Checklist

A clear roadmap to building your own knowledge system. Subscribe and get it straight to your inbox.

6,000+ readers. No spam. Unsubscribe anytime.