Skip to main content
OpenAI-Compatible API Server

Build AI apps with
any model, anywhere

Drop-in replacement for the OpenAI API. Use any client, any framework, any model. Swap providers without changing code.

Try it now, no installation required (requires uv)

uvx --from 'llama-stack[starter]' llama stack run starter
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
    model="llama-3.3-70b",
    input="Summarize this repository",
    tools=[{"type": "web_search"}],
)

Works with

OllamavLLMOpenAIAnthropicAWS BedrockAzure OpenAIGeminiTogether AIFireworksPGVectorQdrantChromaDBMilvusWeaviateOllamavLLMOpenAIAnthropicAWS BedrockAzure OpenAIGeminiTogether AIFireworksPGVectorQdrantChromaDBMilvusWeaviate
20+Inference Providers
11+API Endpoints
4Client Languages
100%OpenAI Compatible

OpenAI-compatible endpoints

Use any OpenAI client library. Zero code changes.

/v1/chat/completions

Chat Completions

Standard OpenAI-compatible chat and completion endpoints

/v1/responses

Responses API

Server-side agentic orchestration with tool calling and MCP

/v1/embeddings

Embeddings

Text embeddings from any provider

/v1/vector_stores

Vector Stores

Managed document storage and semantic search

/v1/moderations

Moderations

Content moderation and safety with configurable shields

/v1/messages

Messages API

Native Anthropic Messages API support

/v1/conversations

Conversations

Multi-turn conversation state management and history

/v1/connectors

Connectors

External connectors like MCP servers and tool integrations

/v1/files

Files

File upload, processing, and content extraction

/v1/batches

Batches

Async batch processing for large-scale workloads

/v1/models

Models

Model discovery and management

How it works

One API surface, pluggable providers, deploy anywhere

Llama Stack Architecture

Plug in any provider

Develop locally with Ollama, deploy to production with vLLM or a managed service

Inference

OllamavLLMAWS BedrockAzure OpenAIOpenAIAnthropicGemini15+ more

Vector Stores

PGVectorQdrantChromaDBMilvusWeaviate4+ more

Tools

MCP ServersWeb SearchFile Search (RAG)PDF / Docling

Open source. Community driven.

Join thousands of developers building with Llama Stack