Migrate an existing app

If you already have an application built on the OpenAI or Anthropic SDK, migrating to Llama Stack is straightforward. In most cases you only need to change the base_url and the 'model' name to include the provider for the model you wish to use.

From OpenAI

This example shows a move from a direct interaction to the OpenAI API to a local Llama3.2 model hosted via Ollama.

Before (talking directly to OpenAI):

from openai import OpenAI

client = OpenAI(api_key="sk-xxx")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

After (talking to Llama Stack):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8321/v1",
    api_key="fake",  # not validated by Llama Stack
)

response = client.chat.completions.create(
    model="ollama/llama3.2:3b",  # use a model registered in your server
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

The only changes are base_url, api_key, and model. Everything else stays the same.

From Anthropic

The Anthropic SDK uses a different interface, so you will need to switch to the OpenAI SDK. Llama Stack implements the OpenAI API.

Before (Anthropic SDK):

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-xxx")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

After (OpenAI SDK pointing to Llama Stack):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8321/v1",
    api_key="fake",
)

response = client.chat.completions.create(
    model="ollama/llama3.2:3b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

What works out of the box

These OpenAI-compatible endpoints are supported:

POST /v1/chat/completions - chat with streaming support
POST /v1/responses - the Responses API with tool calling and agents
POST /v1/embeddings - text embeddings
GET /v1/models - list available models
POST /v1/vector_stores - create and manage vector stores
POST /v1/files - upload files for RAG

See the full OpenAI compatibility guide for details on supported parameters.

What's different

Provider configuration. Llama Stack routes requests to configurable backends (Ollama, vLLM, Bedrock, etc.). The server's run.yaml controls which providers are active. You do not configure providers in your application code.

Model names. Model identifiers depend on the provider. For example, Ollama uses names like llama3.2:3b while remote providers use their own naming conventions. Run GET /v1/models to see what is available on your server.

API key. Llama Stack does not validate API keys by default. You can pass any string (or "fake") as the key. Authentication can be configured separately if needed.

From OpenAI​

From Anthropic​

What works out of the box​

What's different​

From OpenAI

From Anthropic

What works out of the box

What's different