Responses API Internal Flow

The Responses API orchestrates inference, tool execution, safety checks, and state persistence in a single request. The flow changes depending on which parameters you pass.

Use the toggles below to select parameters and see how the request flows through internal subsystems. Pick a preset to see common patterns, or build your own combination.

Python

from openai import OpenAI


client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
    model="llama-3.3-70b",
    input="Explain how transformers work",
)


print(response.output_text)

Every request enters through FastAPI routes and is delegated to the Responses provider. The streaming orchestrator manages the inference loop — calling the LLM and executing requested server-side tools until the model produces a final response, emits a client-side function_call, or reaches max_infer_iters.

Legend

Arrow style	Meaning
Solid teal	Request (outgoing call to a subsystem)
Solid gray	Response (return value from a subsystem)
Dashed amber	SSE event (streaming to client)
Dashed purple	Async operation (background queue, polling)

The dashed box marks the inference loop — the model calls tools, receives results, and calls inference again until no more server-side tool calls are needed, a client-side function_call is returned, or max_infer_iters is reached.

Legend​

Legend