The Agent Stack in 2025: What's Actually Working

After a year of building, shipping, and watching agents fail spectacularly — here’s a clear-eyed map of what components are production-ready, what’s overhyped, and where the real opportunity lies right now.

What’s Actually Working

Tool-augmented LLMs in narrow domains. Give a model 3-5 well-defined tools and a specific task with clear success criteria — this works reliably at scale. Customer support triage, code review assistance, document summarization with retrieval. The key word is narrow.

Human-in-the-loop architectures. Agents that pause and ask for approval at key decision points are dramatically more reliable than fully autonomous ones. The pause is not a failure — it’s the design.

LangGraph for stateful workflows. After trying many orchestration frameworks, LangGraph’s explicit state machine model is the one that actually holds up as complexity grows.

What’s Overhyped

Fully autonomous multi-step research agents. The demos are impressive. The production failure rate is not. Long-horizon autonomy without human checkpoints fails in ways that are hard to anticipate and expensive to fix.

RAG as the default memory solution. RAG is one memory pattern, not the memory pattern. For structured data, preferences, and relational information, a key-value store or relational database is simpler and more reliable.

Where the Opportunity Is

The next wave of valuable agent systems will be built around domain-specific tooling — custom tools that understand the specific constraints and data structures of a particular industry or workflow, rather than general-purpose tools that try to work everywhere.

The companies winning here are the ones investing in the “boring” infrastructure: reliable tool implementations, robust eval frameworks, and clear human escalation paths.