Version: v0.3.2

Agent Execution Loop

Agents are the heart of Llama Stack applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.

Steps in the Agent Workflow

Each agent turn follows these key steps:

Initial Safety Check: The user's input is first screened through configured safety shields
Context Retrieval:
- If RAG is enabled, the agent can choose to query relevant documents from memory banks. You can use the instructions field to steer the agent.
- For new documents, they are first inserted into the memory bank.
- Retrieved context is provided to the LLM as a tool response in the message history.
Inference Loop: The agent enters its main execution loop:
- The LLM receives a user prompt (with previous tool outputs)
- The LLM generates a response, potentially with tool calls
- If tool calls are present:
  - Tool inputs are safety-checked
  - Tools are executed (e.g., web search, code execution)
  - Tool responses are fed back to the LLM for synthesis
- The loop continues until:
  - The LLM provides a final response without tool calls
  - Maximum iterations are reached
  - Token limit is exceeded
Final Safety Check: The agent's final response is screened through safety shields

Execution Flow Diagram

Each step in this process can be monitored and controlled through configurations.

Agent Execution Example

Here's an example that demonstrates monitoring the agent's execution:

Streaming Execution
Non-Streaming Execution

from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger

# Replace host and port
client = LlamaStackClient(base_url=f"http://{HOST}:{PORT}")

agent = Agent(
    client,
    # Check with `llama-stack-client models list`
    model="Llama3.2-3B-Instruct",
    instructions="You are a helpful assistant",
    # Enable both RAG and tool usage
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {"vector_db_ids": ["my_docs"]},
        },
        "builtin::code_interpreter",
    ],
    # Configure safety (optional)
    input_shields=["llama_guard"],
    output_shields=["llama_guard"],
    # Control the inference loop
    max_infer_iters=5,
    sampling_params={
        "strategy": {"type": "top_p", "temperature": 0.7, "top_p": 0.95},
        "max_tokens": 2048,
    },
)
session_id = agent.create_session("monitored_session")

# Stream the agent's execution steps
response = agent.create_turn(
    messages=[{"role": "user", "content": "Analyze this code and run it"}],
    documents=[
        {
            "content": "https://raw.githubusercontent.com/example/code.py",
            "mime_type": "text/plain",
        }
    ],
    session_id=session_id,
)

# Monitor each step of execution
for log in AgentEventLogger().log(response):
    log.print()

from rich.pretty import pprint

# Using non-streaming API, the response contains input, steps, and output.
response = agent.create_turn(
    messages=[{"role": "user", "content": "Analyze this code and run it"}],
    documents=[
        {
            "content": "https://raw.githubusercontent.com/example/code.py",
            "mime_type": "text/plain",
        }
    ],
    session_id=session_id,
    stream=False,
)

pprint(f"Input: {response.input_messages}")
pprint(f"Output: {response.output_message.content}")
pprint(f"Steps: {response.steps}")

Key Configuration Options

Loop Control

max_infer_iters: Maximum number of inference iterations (default: 5)
max_tokens: Token limit for responses
temperature: Controls response randomness

Safety Configuration

input_shields: Safety checks for user input
output_shields: Safety checks for agent responses

Tool Integration

tools: List of available tools for the agent
tool_choice: Control over when tools are used

Agents - Understanding agent fundamentals
Tools Integration - Adding capabilities to agents
Safety Guardrails - Implementing safety measures
RAG (Retrieval Augmented Generation) - Building knowledge-enhanced workflows

Steps in the Agent Workflow​

Execution Flow Diagram​

Agent Execution Example​

Key Configuration Options​

Loop Control​

Safety Configuration​

Tool Integration​

Related Resources​