LLM Agent Core Architecture
LLM Agent framework consists of following core components:
- user request
- planning
- memory
- tools
To some extent, developing an agent in a certain field is to carefully design these components and tailor them to better fit our need.
Besides, we may also need engineering components, including sandbox, safety control, etc. But that’s kind of advanced topic.
Agent Loop
In fact, all agent have the same underlying logic as backbone, that is, agent loop.
graph LR;A[Perceive] --> B[Decide] --> C[Execute] --> D[Feedback] --> A
Even modern frontier design like claude code follows the main agent loop.
Deep Research Agent
We now go through the key technical modules.
- The planner (brain), chain-of-thought
- Tool use & browsing, interfaces through which LLM interacts with the outside world to collect the information necessary for generating final reports.
- Self-reflection & critique.
- Context management, to handle thousands of words from 20+ websites.
- Report synthesizer.
The agent loop (specific to deep research agent) can be described as
1 | def deep_research_loop(user_goal): |
Task decomposition turns a single, vague ‘impossible’ request into a roadmap of small solvable search queries.
Clean web content.
Importance of Tool Use
challenges include
- inconsistent tool calling formats
- error during tool calling
- poor generalization ability
So the problem becomes, how do we call tools? A practical solution is to use MCP, which is a standard interface to connect tools with any agent system.
Future directions:
- open-world tool use (dynamic and scalable)
- parallel and efficient tool scheduling
- pretraining with embedded tool use
- multimodal tool integration
- standard benchmarks, e.g. MCP adoption
A public repo for tool is [CLI-Anything], that converts open source software to agent-useable tools.