Google Interactions API Compatibility

Llama Stack provides a compatibility layer for the Google Interactions API (v1alpha), so teams using the Google GenAI SDK can point at a Llama Stack server with minimal code changes.

from google import genai

client = genai.Client(
    http_options={"api_version": "v1alpha"},
    vertexai=False,
    api_key="fake",
)
# Override the base URL to point at Llama Stack
client._api_client._url = "http://localhost:8321"

response = client.models.generate_interaction(
    model="llama-3.3-70b",
    input="Hello",
)
print(response.outputs[0].text)

Implemented endpoints

Endpoint	Method	Status
`/v1alpha/interactions`	POST	Implemented
`/v1alpha/interactions/{id}`	GET	Not yet
`/v1alpha/interactions/{id}`	DELETE	Not yet
`/v1alpha/interactions/{id}/cancel`	POST	Not yet

For property-level coverage details, see the conformance report.

How it works

The adapter translates Google Interactions requests into OpenAI Chat Completion calls through Llama Stack's inference API. This means any inference provider that Llama Stack supports (vLLM, Ollama, OpenAI, Bedrock, etc.) can serve the Google Interactions API.

Supported features:

Text generation with string or multi-turn conversation input
Streaming via Server-Sent Events matching Google's event format
System instructions mapped to the system role
Generation config parameters (temperature, top_p, top_k, max_output_tokens)

Known limitations

Only text content is supported; multimodal inputs (images, audio, video) are not yet implemented
Tool declarations (Function, GoogleSearch, CodeExecution, MCP) are not yet supported
Background execution and interaction storage (store, background) are not available
The GET, DELETE, and Cancel endpoints are not yet implemented
Response modalities are accepted for compatibility but ignored

Implemented endpoints​

How it works​

Known limitations​

Implemented endpoints

How it works

Known limitations