After testing dozens of tool-use architectures, these three patterns consistently outperform the rest.
Pattern 1: The Single Responsibility Principle for Tools
Every tool should do one thing well. This sounds obvious but is almost universally violated in practice.
The temptation is to build “smart” tools that handle multiple cases. A search tool that can search the web, a database, and a file system depending on the query. A get_info tool that returns whatever information seems relevant.
These tools fail at scale because:
- The model doesn’t know which behavior to expect
- Error handling becomes combinatorially complex
- Testing is nearly impossible
- The model hallucinates tool capabilities that don’t exist
Instead: search_web, query_database, read_file. Three tools. Three clear contracts. Three times the reliability.
Pattern 2: Explicit State Management
The biggest source of tool use bugs is implicit state — when tools have side effects that affect future tool calls in ways the model doesn’t understand.
The pattern that works: make all state explicit in tool inputs and outputs. If a tool needs to know about a previous operation, pass that context explicitly as a parameter. Never rely on the model to “remember” the state from a previous turn.
# Bad: implicit state
def get_next_page() -> list[dict]:
... # relies on some internal cursor state
# Good: explicit state
def get_page(cursor: str | None = None) -> dict:
return {
"items": [...],
"next_cursor": "abc123" # explicit, model can pass this forward
}
Pattern 3: Graceful Degradation in Tool Responses
Every tool will fail eventually. The question is whether that failure breaks your entire agent or just causes a retry.
The pattern: every tool returns a structured response that includes a success indicator and a human-readable error message. Never raise exceptions — return them.
@dataclass
class ToolResponse:
success: bool
data: Any | None
error: str | None # always present if success=False
suggestions: list[str] # what the agent can try instead
The suggestions field is underrated. When a tool fails, telling the model what to try instead dramatically reduces the chance of getting stuck in a dead end.
Why These Patterns Scale
Each of these patterns reduces the cognitive load on the model. When tools are simple, stateless, and fail gracefully, the model can focus on reasoning rather than managing complexity.
The teams that ship reliable tool-use systems aren’t smarter than everyone else. They’re just more disciplined about keeping tool interfaces clean.