Build AI apps with
any model, anywhere
Drop-in replacement for the OpenAI API. Use any client, any framework, any model. Swap providers without changing code.
Try it now, no installation required (requires uv)
uvx --from 'llama-stack[starter]' llama stack run starterfrom openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
model="llama-3.3-70b",
input="Summarize this repository",
tools=[{"type": "web_search"}],
)Works with
OpenAI-compatible endpoints
Use any OpenAI client library. Zero code changes.
/v1/chat/completionsChat Completions
Standard OpenAI-compatible chat and completion endpoints
/v1/responsesResponses API
Server-side agentic orchestration with tool calling and MCP
/v1/embeddingsEmbeddings
Text embeddings from any provider
/v1/vector_storesVector Stores
Managed document storage and semantic search
/v1/moderationsModerations
Content moderation and safety with configurable shields
/v1/messagesMessages API
Native Anthropic Messages API support
/v1/conversationsConversations
Multi-turn conversation state management and history
/v1/connectorsConnectors
External connectors like MCP servers and tool integrations
/v1/filesFiles
File upload, processing, and content extraction
/v1/batchesBatches
Async batch processing for large-scale workloads
/v1/modelsModels
Model discovery and management
How it works
One API surface, pluggable providers, deploy anywhere
Plug in any provider
Develop locally with Ollama, deploy to production with vLLM or a managed service
Inference
Vector Stores
Tools
Open source. Community driven.
Join thousands of developers building with Llama Stack