Claude Tool Use: The Complete Guide to Building AI Agents

Tool use is the single most important feature for building AI agents that actually do things. Without it, Claude (or any LLM) can only work with information you explicitly put in the prompt. With tool use, you give Claude the ability to call functions, query APIs, read files, browse the web, run code — anything you can wrap in a function.

This guide covers everything: the mechanics of the tool use loop, practical Python examples, tool design best practices, and patterns for production-grade agents. Let's build something real.

Get Your Claude API Key

Everything in this guide runs on the Claude API. Create a free Anthropic account, generate an API key, and add a few dollars of credits to start building. No waitlist, no approval required.

Open Anthropic Console →

1. What is Tool Use?

Tool use (also called "function calling" in the OpenAI world) lets Claude request that your application run a specific function and return the result. Claude decides when to use a tool, what arguments to pass, and how to incorporate the result into its reasoning.

Here's the simplest mental model:

# Without tools:
You: "What's the weather in Vienna?"
Claude: "I don't have access to real-time data..."

# With tools:
You: "What's the weather in Vienna?"
Claude: [calls get_weather("Vienna")]
Your app: {"temp": 12, "condition": "cloudy"}
Claude: "It's 12°C and cloudy in Vienna right now."

The key insight: Claude doesn't execute the tool. Claude requests it, and your code runs it. This keeps you in control of what Claude can actually do.

2. Why Tool Use Is the Key to Real Agents

Most "AI agents" you see online are just multi-step prompting chains — Claude outputs text, your code parses it, something happens. That works for simple cases but falls apart when you need:

Real-time data: Stock prices, weather, search results, live APIs
Side effects: Creating calendar events, sending emails, writing files
Dynamic retrieval: Claude decides what information it needs and fetches it
Multi-step reasoning: Use one tool's output to decide which tool to call next

Tool use solves all of these. It's how you go from "Claude generates text about tasks" to "Claude actually completes tasks."

Key insight

The real power of tool use isn't any single tool — it's the agentic loop: Claude calls tools, gets results, reasons about them, calls more tools, and continues until the task is complete. The loop is the agent.

3. How the Agentic Loop Works

Understanding the request/response cycle is essential before writing a single line of code:

1. You send Claude a message + a list of available tools
   ↓
2. Claude returns a response that either:
   a. Is a normal text response (task complete), OR
   b. Contains one or more tool_use blocks (Claude wants to call a tool)
   ↓
3. If Claude wants tools:
   - Extract the tool name and input from the response
   - Run the actual function in your code
   - Return the result in a new message
   ↓
4. Claude processes the result and either:
   - Calls more tools  → go to step 3
   - Gives you the final text response → done

This loop runs until Claude gives a final end_turn response with no tool calls. A well-designed agent typically makes 2–8 tool calls per task.

4. Your First Tool-Calling Agent (Python)

Let's build a minimal working example. You'll need the Anthropic SDK and an API key from console.anthropic.com:

pip install anthropic

Here's a complete, working agent with one tool:

import anthropic
import json

client = anthropic.Anthropic()

# Step 1: Define your tool
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city. Returns temperature in Celsius and a condition string.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'Vienna' or 'New York'"
                }
            },
            "required": ["city"]
        }
    }
]

# Step 2: The actual function that runs when Claude calls the tool
def get_weather(city: str) -> dict:
    # In a real app, you'd call a weather API here
    mock_data = {
        "Vienna":   {"temp": 12, "condition": "Partly cloudy"},
        "London":   {"temp": 9,  "condition": "Overcast"},
        "New York": {"temp": 18, "condition": "Sunny"},
    }
    return mock_data.get(city, {"temp": 15, "condition": "Unknown"})

# Step 3: The agentic loop
def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # If Claude is done, return the final text
        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if b.type == "text")

        # Process tool calls
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                if block.name == "get_weather":
                    result = get_weather(**block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result)
                })

        # Critical: add BOTH the assistant's response AND the tool results
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})

# Run it
result = run_agent("What's the weather in Vienna and London? Which is warmer?")
print(result)
# "Vienna is 12°C (partly cloudy) vs London at 9°C (overcast). Vienna is warmer by 3°C."

Notice what happened: Claude made two tool calls (one per city), processed both results, then synthesized a comparison. You didn't tell it to check both cities — it figured that out from the question.

5. Real Example: A Web Research Agent

Let's build something more realistic — a research agent with two tools: web search and page fetching. This pattern applies to any agent that needs to gather information from multiple sources before answering.

import anthropic, requests, json

client = anthropic.Anthropic()

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Returns a list of results with titles, URLs, and summaries. Use this before fetch_page to discover relevant sources.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query":       {"type": "string", "description": "The search query"},
                "num_results": {"type": "integer", "description": "Number of results (1-5, default 3)"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "fetch_page",
        "description": "Fetch and return the text content of a specific URL. Use this when you need to read the full content of a page found via web_search.",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "The URL to fetch"}
            },
            "required": ["url"]
        }
    }
]

def web_search(query: str, num_results: int = 3) -> list:
    # Replace with Serper, Brave Search, or SerpAPI for production
    url = f"https://api.duckduckgo.com/?q={query}&format=json&no_redirect=1"
    data = requests.get(url, timeout=5).json()
    results = []
    for topic in data.get("RelatedTopics", [])[:num_results]:
        if "Text" in topic:
            results.append({
                "summary": topic["Text"],
                "url":     topic.get("FirstURL", "")
            })
    return results or [{"summary": "No results found"}]

def fetch_page(url: str) -> str:
    try:
        resp = requests.get(url, timeout=10, headers={"User-Agent": "Mozilla/5.0"})
        return resp.text[:6000]  # Truncate to avoid huge contexts
    except Exception as e:
        return f"Error: {e}"

def run_research_agent(question: str) -> str:
    system = """You are a research assistant. To answer questions:
1. Use web_search to find relevant sources
2. Use fetch_page to read specific sources if needed
3. Synthesize findings into a clear, factual answer
4. Cite your sources with URLs"""

    messages = [{"role": "user", "content": question}]
    tool_map = {"web_search": web_search, "fetch_page": fetch_page}
    max_iterations = 8

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=2048,
            system=system,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if b.type == "text")

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = tool_map[block.name](**block.input)
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     json.dumps(result) if isinstance(result, (dict, list)) else result
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})

    return "Agent reached iteration limit without completing the task."

6. Tool Design Best Practices

The quality of your tool definitions directly determines how well Claude uses them. These rules will save you hours of debugging.

Write descriptions for Claude, not for humans

Claude reads your description to decide when to call the tool. Be explicit about when to use it, what it returns, and its limitations.

# Bad — too vague
"description": "Gets user data"

# Good — tells Claude exactly when and how to use it
"description": "Look up a user's profile and subscription status by email.
Use this when you need to verify if a user exists or check their plan.
Returns { found: bool, user: { name, email, plan, created_at } } or
{ found: false, error: string } if the user doesn't exist."

Use enums for constrained options

Whenever a parameter has a fixed set of valid values, use enum in the JSON schema. This prevents Claude from hallucinating invalid values:

"status": {
    "type": "string",
    "enum": ["active", "paused", "cancelled"],
    "description": "Filter by subscription status"
}

Always return structured data

Return JSON dicts from your tools, not prose strings. Claude reasons about structured data much more reliably than parsing natural language.

Return errors inside the result, not as exceptions

Don't let tool functions raise exceptions — Claude can't reason about Python tracebacks. Return error information as structured data:

def get_user(email: str) -> dict:
    user = db.find_user(email)
    if not user:
        return {"found": False, "error": f"No user with email {email}"}
    return {"found": True, "user": user.to_dict()}

7. Error Handling Patterns

Always set a max iteration limit

Without a limit, a confused agent will loop until you hit the API rate limit or run up your bill. The example above uses for _ in range(max_iterations) instead of while True. 8–12 iterations covers 99% of real tasks.

Wrap tools in a timeout

A hanging HTTP request will block your entire agent. Wrap tool calls with a timeout:

from concurrent.futures import ThreadPoolExecutor, TimeoutError

def run_tool_safely(fn, args, timeout=10):
    with ThreadPoolExecutor(max_workers=1) as executor:
        future = executor.submit(fn, **args)
        try:
            return future.result(timeout=timeout)
        except TimeoutError:
            return {"error": f"Tool timed out after {timeout}s"}
        except Exception as e:
            return {"error": str(e)}

Log everything during development

Tool calls are the hardest part of agent debugging. Log the tool name, inputs, and outputs on every call. Once your agent works, you can reduce logging.

for block in response.content:
    if block.type == "tool_use":
        print(f"[TOOL] {block.name}({block.input})")
        result = tool_map[block.name](**block.input)
        print(f"[RESULT] {json.dumps(result)[:200]}")

8. Common Mistakes

Forgetting to add the assistant's response to messages. The most common bug. You must add both Claude's response AND your tool results to the messages list. If you only add tool results, Claude loses context of what it asked for and the API returns an error.

Returning plain strings instead of JSON. Use json.dumps(result) for dict/list returns. Claude parses JSON far better than unstructured text strings.

Vague tool descriptions. Claude picks tools based entirely on their description. If the description doesn't clearly say when to use the tool, Claude will either overuse it or ignore it. Be specific.

Too many tools at once. Start with 2–3 tools. More tools means more decision surface and more potential for Claude to misuse them. Add tools incrementally once the basics work.

No iteration limit. A confused agent will call tools in loops. Always use a counter-based loop, not while True.

9. Next Steps

You now have the foundation for building real Claude agents. Where to go from here:

Official docs: Anthropic's tool use documentation covers parallel tool calls, streaming, and computer use
Parallel tool calls: Claude can request multiple tools simultaneously. Run these in parallel with asyncio.gather() for 2–4x speed improvements
Memory tool: Add a save_note(key, value) tool backed by a dict so your agent can remember things across turns
Streaming: For user-facing agents, add streaming so users see Claude's thinking in real-time rather than waiting for the full response
n8n integration: Connect Claude tool calls to n8n workflows — gives you 400+ integrations without writing any Python for the actual tool logic

What to build next

The best way to solidify this is to automate something you do manually — searching for information, drafting emails, updating a spreadsheet. Pick one repetitive task and build a Claude agent for it using the patterns above.

Ready to Build Your First Agent?

Sign up for the Claude API and get a key in under 5 minutes. The usage-based pricing means a small agent for personal automation costs almost nothing — Haiku handles most tasks at $0.80/million input tokens.

Start with Claude API →

Claude Tool Use: The Complete Guide to Building AI Agents

Table of Contents

Get Your Claude API Key

1. What is Tool Use?

2. Why Tool Use Is the Key to Real Agents

3. How the Agentic Loop Works

4. Your First Tool-Calling Agent (Python)

5. Real Example: A Web Research Agent

6. Tool Design Best Practices

Write descriptions for Claude, not for humans

Use enums for constrained options

Always return structured data

Return errors inside the result, not as exceptions

7. Error Handling Patterns

Always set a max iteration limit

Wrap tools in a timeout

Log everything during development

8. Common Mistakes

9. Next Steps

Ready to Build Your First Agent?

Related Guides