Responses API Internal Flow
The Responses API orchestrates inference, tool execution, safety checks, and state persistence in a single request. The flow changes depending on which parameters you pass.
Use the toggles below to select parameters and see how the request flows through internal subsystems. Pick a preset to see common patterns, or build your own combination.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
model="llama-3.3-70b",
input="Explain how transformers work",
)
print(response.output_text)
Every request enters through FastAPI routes and is delegated to the Responses provider. The streaming orchestrator manages the inference loop — calling the LLM and executing requested server-side tools until the model produces a final response, emits a client-side function_call, or reaches max_infer_iters.
Legend
| Arrow style | Meaning |
|---|---|
| Solid teal | Request (outgoing call to a subsystem) |
| Solid gray | Response (return value from a subsystem) |
| Dashed amber | SSE event (streaming to client) |
| Dashed purple | Async operation (background queue, polling) |
The dashed box marks the inference loop — the model calls tools, receives results, and calls inference again until no more server-side tool calls are needed, a client-side function_call is returned, or max_infer_iters is reached.