Google Interactions API Compatibility
Llama Stack provides a compatibility layer for the Google Interactions API (v1alpha), so teams using the Google GenAI SDK can point at a Llama Stack server with minimal code changes.
from google import genai
client = genai.Client(
http_options={"api_version": "v1alpha"},
vertexai=False,
api_key="fake",
)
# Override the base URL to point at Llama Stack
client._api_client._url = "http://localhost:8321"
response = client.models.generate_interaction(
model="llama-3.3-70b",
input="Hello",
)
print(response.outputs[0].text)
Implemented endpoints
| Endpoint | Method | Status |
|---|---|---|
/v1alpha/interactions | POST | Implemented |
/v1alpha/interactions/{id} | GET | Not yet |
/v1alpha/interactions/{id} | DELETE | Not yet |
/v1alpha/interactions/{id}/cancel | POST | Not yet |
For property-level coverage details, see the conformance report.
How it works
The adapter translates Google Interactions requests into OpenAI Chat Completion calls through Llama Stack's inference API. This means any inference provider that Llama Stack supports (vLLM, Ollama, OpenAI, Bedrock, etc.) can serve the Google Interactions API.
Supported features:
- Text generation with string or multi-turn conversation input
- Streaming via Server-Sent Events matching Google's event format
- System instructions mapped to the system role
- Generation config parameters (temperature, top_p, top_k, max_output_tokens)
Known limitations
- Only text content is supported; multimodal inputs (images, audio, video) are not yet implemented
- Tool declarations (Function, GoogleSearch, CodeExecution, MCP) are not yet supported
- Background execution and interaction storage (
store,background) are not available - The GET, DELETE, and Cancel endpoints are not yet implemented
- Response modalities are accepted for compatibility but ignored