OpenAI-Compatible API Server
Build AI apps with
any model, anywhere
Drop-in replacement for the OpenAI API. Use any client, any framework, any model. Swap providers without changing code.
Try it now, no installation required (requires uv)
uvx --from 'llama-stack[starter]' llama stack run starterfrom openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
model="llama-3.3-70b",
input="Summarize this repository",
tools=[{"type": "web_search"}],
)Works with
OllamavLLMOpenAIAnthropicAWS BedrockAzure OpenAIGeminiTogether AIFireworksPGVectorQdrantChromaDBMilvusWeaviateand more
OpenAI-compatible endpoints
Use any OpenAI client library. Zero code changes.
/v1/chat/completionsChat Completions
Standard OpenAI-compatible chat and completion endpoints
/v1/responsesResponses API
Server-side agentic orchestration with tool calling and MCP
/v1/embeddingsEmbeddings
Text embeddings from any provider
/v1/vector_storesVector Stores
Managed document storage and semantic search
/v1/filesFiles & Batches
File upload, processing, and batch operations
/v1/modelsModels
Model discovery and management
How it works
One API surface, pluggable providers, deploy anywhere
Plug in any provider
Develop locally with Ollama, deploy to production with vLLM or a managed service
Inference
Vector Stores
Tools
Open source. Community driven.
Join thousands of developers building with Llama Stack