Lab 05: Tool Calling
Use Claude’s native tool calling for structured, reliable AI output.
Objectives
By the end of this lab, you will:
- Understand Claude’s tool calling mechanism
- Define tools with JSON schemas
- Force structured output with tool_choice
- Parse and handle tool responses reliably
Prerequisites
- Lab 04 completed (TaskManager)
- Basic understanding of JSON Schema
The Problem with Free-Form Output
In earlier labs, we parsed Claude’s text responses:
# Lab 03: Parsing free-form verification response
response_text = "PASSED: yes\nCONFIDENCE: 0.95\nFEEDBACK: Looks good"
# Fragile parsing
for line in response_text.split("\n"):
if line.startswith("PASSED:"):
passed = "yes" in line.lower() # What if Claude says "Yeah"?
This approach is fragile because:
- Claude might use different formats (“Yes”, “TRUE”, “Affirmative”)
- Line order might change
- Extra text might appear
- Regex patterns break on edge cases
The Solution: Tool Calling
Tool calling forces Claude to respond with structured JSON:
# Define the structure you want
tools = [{
"name": "submit_result",
"description": "Submit the completed task result",
"input_schema": {
"type": "object",
"properties": {
"result": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["result", "confidence"]
}
}]
# Claude MUST respond with this structure
response = client.messages.create(
model="claude-sonnet-4-20250514",
tools=tools,
tool_choice={"type": "any"}, # Force tool use
messages=[...]
)
# Guaranteed structure!
tool_input = response.content[0].input
result = tool_input["result"] # Always exists
confidence = tool_input["confidence"] # Always a number 0-1
Step 1: Define Your Tools
Create tools.py:
"""
Tool Definitions - Lab 05
Defines tools for structured AI output in task processing.
"""
# Tool for submitting a completed task
SUBMIT_RESULT_TOOL = {
"name": "submit_result",
"description": "Submit the completed result for a task. Use this when you have successfully completed the task.",
"input_schema": {
"type": "object",
"properties": {
"result": {
"type": "string",
"description": "The complete result/output for the task"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Your confidence in the result (0.0 to 1.0)"
},
"notes": {
"type": "string",
"description": "Optional notes about the approach or limitations"
}
},
"required": ["result", "confidence"]
}
}
# Tool for reporting inability to complete
REPORT_FAILURE_TOOL = {
"name": "report_failure",
"description": "Report that you cannot complete the task. Use this when the task is impossible, unclear, or you've encountered an insurmountable problem.",
"input_schema": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Why the task cannot be completed"
},
"category": {
"type": "string",
"enum": ["impossible", "unclear", "missing_info", "out_of_scope", "other"],
"description": "Category of failure"
},
"suggestion": {
"type": "string",
"description": "Suggested modification to make the task completable"
}
},
"required": ["reason", "category"]
}
}
# Tool for requesting clarification
REQUEST_CLARIFICATION_TOOL = {
"name": "request_clarification",
"description": "Request clarification before proceeding. Use this when the task is ambiguous and you need more information.",
"input_schema": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The specific question you need answered"
},
"options": {
"type": "array",
"items": {"type": "string"},
"description": "Possible answers/options if applicable"
},
"default": {
"type": "string",
"description": "What you'll assume if no clarification is provided"
}
},
"required": ["question"]
}
}
# Tool for verification results
VERIFICATION_RESULT_TOOL = {
"name": "verification_result",
"description": "Submit the result of verifying a task output.",
"input_schema": {
"type": "object",
"properties": {
"passed": {
"type": "boolean",
"description": "Whether the output passes verification"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence in this verification judgment"
},
"feedback": {
"type": "string",
"description": "Specific feedback about what passed or failed"
},
"issues": {
"type": "array",
"items": {"type": "string"},
"description": "List of specific issues found (if any)"
}
},
"required": ["passed", "confidence", "feedback"]
}
}
# Combined tool sets for different operations
TASK_COMPLETION_TOOLS = [
SUBMIT_RESULT_TOOL,
REPORT_FAILURE_TOOL,
REQUEST_CLARIFICATION_TOOL
]
VERIFICATION_TOOLS = [
VERIFICATION_RESULT_TOOL
]
ALL_TOOLS = [
SUBMIT_RESULT_TOOL,
REPORT_FAILURE_TOOL,
REQUEST_CLARIFICATION_TOOL,
VERIFICATION_RESULT_TOOL
]
Step 2: Create the Tool Handler
Create tool_handler.py:
"""
Tool Handler - Lab 05
Processes Claude's tool call responses.
"""
from typing import Optional, Dict, Any, Tuple
from dataclasses import dataclass
from enum import Enum
class ToolCallType(Enum):
"""Types of tool calls we handle."""
SUBMIT_RESULT = "submit_result"
REPORT_FAILURE = "report_failure"
REQUEST_CLARIFICATION = "request_clarification"
VERIFICATION_RESULT = "verification_result"
UNKNOWN = "unknown"
@dataclass
class ToolCallResult:
"""Parsed result from a tool call."""
tool_name: str
tool_type: ToolCallType
inputs: Dict[str, Any]
raw_response: Any
@property
def is_success(self) -> bool:
"""Check if this represents a successful completion."""
return self.tool_type == ToolCallType.SUBMIT_RESULT
@property
def is_failure(self) -> bool:
"""Check if this represents a failure."""
return self.tool_type == ToolCallType.REPORT_FAILURE
@property
def needs_clarification(self) -> bool:
"""Check if clarification is needed."""
return self.tool_type == ToolCallType.REQUEST_CLARIFICATION
@property
def is_verification(self) -> bool:
"""Check if this is a verification result."""
return self.tool_type == ToolCallType.VERIFICATION_RESULT
def parse_tool_response(response) -> Optional[ToolCallResult]:
"""
Parse a Claude API response containing tool calls.
Args:
response: The full response from client.messages.create()
Returns:
ToolCallResult if a tool was called, None otherwise
"""
# Find the tool_use block in the response
tool_use_block = None
for block in response.content:
if block.type == "tool_use":
tool_use_block = block
break
if not tool_use_block:
return None
# Determine tool type
tool_name = tool_use_block.name
try:
tool_type = ToolCallType(tool_name)
except ValueError:
tool_type = ToolCallType.UNKNOWN
return ToolCallResult(
tool_name=tool_name,
tool_type=tool_type,
inputs=tool_use_block.input,
raw_response=response
)
def extract_result(tool_result: ToolCallResult) -> Tuple[str, float]:
"""
Extract result and confidence from a submit_result tool call.
Returns:
Tuple of (result_text, confidence)
"""
if tool_result.tool_type != ToolCallType.SUBMIT_RESULT:
raise ValueError(f"Expected submit_result, got {tool_result.tool_type}")
return (
tool_result.inputs.get("result", ""),
tool_result.inputs.get("confidence", 0.5)
)
def extract_failure(tool_result: ToolCallResult) -> Tuple[str, str, Optional[str]]:
"""
Extract failure details from a report_failure tool call.
Returns:
Tuple of (reason, category, suggestion)
"""
if tool_result.tool_type != ToolCallType.REPORT_FAILURE:
raise ValueError(f"Expected report_failure, got {tool_result.tool_type}")
return (
tool_result.inputs.get("reason", "Unknown reason"),
tool_result.inputs.get("category", "other"),
tool_result.inputs.get("suggestion")
)
def extract_verification(tool_result: ToolCallResult) -> Tuple[bool, float, str, list]:
"""
Extract verification details from a verification_result tool call.
Returns:
Tuple of (passed, confidence, feedback, issues)
"""
if tool_result.tool_type != ToolCallType.VERIFICATION_RESULT:
raise ValueError(f"Expected verification_result, got {tool_result.tool_type}")
return (
tool_result.inputs.get("passed", False),
tool_result.inputs.get("confidence", 0.5),
tool_result.inputs.get("feedback", ""),
tool_result.inputs.get("issues", [])
)
Step 3: Update the Task Executor
Create executor.py:
"""
Task Executor with Tool Calling - Lab 05
Executes tasks using Claude with structured tool responses.
"""
import anthropic
from tools import TASK_COMPLETION_TOOLS, VERIFICATION_TOOLS
from tool_handler import (
parse_tool_response,
extract_result,
extract_failure,
extract_verification,
ToolCallType
)
client = anthropic.Anthropic()
def execute_task(task: dict) -> dict:
"""
Execute a task and get structured response via tool calling.
Args:
task: Task dict with 'description' and optional 'criteria'
Returns:
Dict with execution result:
{
"status": "completed" | "failed" | "needs_clarification",
"result": str (if completed),
"confidence": float (if completed),
"reason": str (if failed),
"question": str (if needs_clarification),
...
}
"""
# Build the prompt
prompt = f"""Complete the following task. Use the appropriate tool to submit your response.
TASK: {task['description']}
"""
if task.get("criteria"):
criteria = task["criteria"]
prompt += f"\nREQUIREMENTS:\n"
if criteria.get("type"):
prompt += f"- Output type: {criteria['type']}\n"
for req in criteria.get("requirements", []):
prompt += f"- {req}\n"
if task.get("last_feedback"):
prompt += f"\nPREVIOUS FEEDBACK: {task['last_feedback']}\n"
prompt += "Please address this feedback in your response.\n"
# Call Claude with tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=TASK_COMPLETION_TOOLS,
tool_choice={"type": "any"}, # Force tool use
messages=[{"role": "user", "content": prompt}]
)
# Parse the tool response
tool_result = parse_tool_response(response)
if not tool_result:
# Shouldn't happen with tool_choice="any", but handle it
return {
"status": "failed",
"reason": "No tool call in response",
"category": "other"
}
# Handle based on tool type
if tool_result.is_success:
result, confidence = extract_result(tool_result)
return {
"status": "completed",
"result": result,
"confidence": confidence,
"notes": tool_result.inputs.get("notes")
}
elif tool_result.is_failure:
reason, category, suggestion = extract_failure(tool_result)
return {
"status": "failed",
"reason": reason,
"category": category,
"suggestion": suggestion
}
elif tool_result.needs_clarification:
return {
"status": "needs_clarification",
"question": tool_result.inputs.get("question"),
"options": tool_result.inputs.get("options", []),
"default": tool_result.inputs.get("default")
}
else:
return {
"status": "failed",
"reason": f"Unknown tool: {tool_result.tool_name}",
"category": "other"
}
def verify_result(task: dict, result: str) -> dict:
"""
Verify a result using tool calling for structured output.
Args:
task: The original task with criteria
result: The result to verify
Returns:
Dict with verification result:
{
"passed": bool,
"confidence": float,
"feedback": str,
"issues": list
}
"""
criteria = task.get("criteria", {})
prompt = f"""You are a verification assistant. Evaluate whether this result meets the criteria.
ORIGINAL TASK: {task['description']}
CRITERIA:
"""
if criteria.get("type"):
prompt += f"- Type: {criteria['type']}\n"
for req in criteria.get("requirements", []):
prompt += f"- {req}\n"
if not criteria:
prompt += "- Result should reasonably complete the task\n"
prompt += f"""
RESULT TO VERIFY:
{result}
Use the verification_result tool to submit your evaluation. Be strict but fair.
"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
tools=VERIFICATION_TOOLS,
tool_choice={"type": "tool", "name": "verification_result"},
messages=[{"role": "user", "content": prompt}]
)
tool_result = parse_tool_response(response)
if tool_result and tool_result.is_verification:
passed, confidence, feedback, issues = extract_verification(tool_result)
return {
"passed": passed,
"confidence": confidence,
"feedback": feedback,
"issues": issues
}
# Fallback if something went wrong
return {
"passed": False,
"confidence": 0.0,
"feedback": "Verification failed to produce structured output",
"issues": ["Tool call parsing failed"]
}
Step 4: Update the Main Loop
Create loop_with_tools.py:
"""
Loop with Tool Calling - Lab 05
Demonstrates structured AI output via tool calling.
"""
from task_manager import TaskManager
from executor import execute_task, verify_result
def process_task(manager: TaskManager, task) -> bool:
"""Process a single task with tool calling."""
task_id = task.id
attempt = task.attempts + 1
print(f"\n[{task_id}] Attempt {attempt}/{task.max_attempts}")
print(f" Task: {task.description[:50]}...")
manager.start(task_id)
# Execute with tool calling
result = execute_task(task.to_dict())
if result["status"] == "completed":
print(f" Result: {result['result'][:50]}...")
print(f" Confidence: {result['confidence']:.0%}")
# Verify if criteria exist
if task.criteria:
print(f" Verifying...")
verification = verify_result(task.to_dict(), result["result"])
if verification["passed"]:
print(f" ✓ Verified ({verification['confidence']:.0%})")
manager.complete(task_id, result["result"])
return True
else:
print(f" ✗ Verification failed: {verification['feedback']}")
if verification["issues"]:
for issue in verification["issues"]:
print(f" - {issue}")
# Retry or fail
if attempt < task.max_attempts:
manager.retry(task_id, verification["feedback"])
else:
manager.fail(task_id, f"Failed verification: {verification['feedback']}")
return False
else:
# No criteria, accept based on confidence
if result["confidence"] >= 0.7:
manager.complete(task_id, result["result"])
return True
else:
if attempt < task.max_attempts:
manager.retry(task_id, "Low confidence")
else:
manager.complete(task_id, result["result"]) # Accept anyway
return True
elif result["status"] == "failed":
print(f" ✗ Failed: {result['reason']}")
print(f" Category: {result['category']}")
if result.get("suggestion"):
print(f" Suggestion: {result['suggestion']}")
manager.fail(task_id, result["reason"])
return False
elif result["status"] == "needs_clarification":
print(f" ? Needs clarification: {result['question']}")
if result.get("options"):
print(f" Options: {', '.join(result['options'])}")
if result.get("default"):
print(f" Default: {result['default']}")
# In a real system, you'd prompt the user here
# For now, use the default or retry
if result.get("default"):
# Update task with clarification and retry
manager.update(task_id, last_feedback=f"Clarification: {result['default']}")
manager.retry(task_id, f"Using default: {result['default']}")
else:
manager.fail(task_id, f"Needs clarification: {result['question']}")
return False
return False
def main():
manager = TaskManager("tasks.json")
# Create sample tasks if empty
if not manager.tasks:
print("Creating sample tasks...\n")
manager.create(
"Write a haiku about Python programming",
criteria={
"type": "haiku",
"requirements": [
"Exactly 3 lines",
"Approximately 5-7-5 syllable pattern",
"Related to Python programming"
]
}
)
manager.create(
"List exactly 3 advantages of type hints in Python",
criteria={
"type": "list",
"requirements": [
"Exactly 3 items",
"Each item is an advantage of type hints",
"Clear and concise explanations"
]
}
)
manager.create(
"Calculate the factorial of 7"
# No criteria - will accept based on confidence
)
# Process loop
stats = manager.get_stats()
print(f"Tasks: {stats['pending']} pending, {stats['completed']} completed, {stats['failed']} failed")
while manager.has_pending():
task = manager.get_next()
process_task(manager, task)
# Final report
print("\n" + "=" * 60)
print("FINAL REPORT")
print("=" * 60)
stats = manager.get_stats()
print(f"\nCompletion rate: {stats['completion_rate']:.0f}%")
print(f"Failure rate: {stats['failure_rate']:.0f}%")
print(f"Average attempts: {stats['average_attempts']}")
for task in manager.get_all():
icon = {"completed": "✓", "failed": "✗", "pending": "○"}.get(task.status, "?")
print(f"\n{icon} {task.description[:50]}...")
print(f" Status: {task.status}, Attempts: {task.attempts}")
if task.status == "completed" and task.result:
print(f" Result: {task.result[:60]}...")
if __name__ == "__main__":
main()
Understanding Tool Calling
How It Works
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your Code │──────│ Claude API │──────│ Claude LLM │
│ │ │ │ │ │
│ tools=[...] │ │ Validates │ │ Generates │
│ tool_choice=any │─────▶│ schema │─────▶│ structured JSON │
│ │ │ │ │ │
│ response.content│◀─────│ Returns │◀─────│ via tool_use │
│ [0].input │ │ tool_use block │ │ block │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Tool Choice Options
# Let Claude decide whether to use a tool
tool_choice={"type": "auto"}
# Force Claude to use ANY of the provided tools
tool_choice={"type": "any"}
# Force Claude to use a SPECIFIC tool
tool_choice={"type": "tool", "name": "submit_result"}
JSON Schema Power
{
"type": "object",
"properties": {
"confidence": {
"type": "number",
"minimum": 0, # Guaranteed range!
"maximum": 1
},
"category": {
"type": "string",
"enum": ["a", "b", "c"] # Guaranteed values!
}
},
"required": ["confidence"] # Guaranteed presence!
}
Comparison: Text Parsing vs Tool Calling
| Aspect | Text Parsing | Tool Calling |
|---|---|---|
| Output format | Unpredictable | Guaranteed JSON |
| Parsing | Regex/string ops | Direct dict access |
| Validation | Manual | Schema-enforced |
| Error handling | Complex | Simple |
| Type safety | None | Built-in |
| Maintenance | Fragile | Robust |
Exercises
Exercise 1: Add a Progress Tool
Create a tool for reporting partial progress:
REPORT_PROGRESS_TOOL = {
"name": "report_progress",
"input_schema": {
"properties": {
"percent_complete": {"type": "integer", "minimum": 0, "maximum": 100},
"current_step": {"type": "string"},
"partial_result": {"type": "string"}
}
}
}
Exercise 2: Multi-Tool Responses
Handle cases where Claude might want to use multiple tools in sequence (e.g., clarify then submit).
Exercise 3: Tool Call Logging
Add detailed logging of all tool calls for debugging:
def log_tool_call(tool_result: ToolCallResult):
"""Log tool call details to a file."""
pass
Checkpoint
Before moving on, verify:
- Tool definitions compile without errors
- Claude responds with tool_use blocks
- Tool responses parse correctly
- Verification uses structured output
- You understand tool_choice options
Key Takeaway
Tool calling beats regex parsing for structured AI output.
With tool calling:
- Guaranteed structure: JSON schema enforces format
- Type safety: Numbers are numbers, booleans are booleans
- Validation built-in: Enums, ranges, required fields
- Simpler code: No regex, no string parsing
- More reliable: No edge cases from format variations
Get the Code
Full implementation: 8me/src/tier1-ralph-loop/claude_tools.py