Build AI apps with
any model, anywhere
OpenAI and Anthropic compatible API server. Use any client, any framework, any model. Swap providers without changing code.
Try it now, no installation required (requires uv)
uvx --from 'llama-stack[starter]' llama stack run starter/v1/responsesfrom openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
model="llama-3.3-70b",
input="Summarize this repository",
tools=[{"type": "web_search"}],
)Works with
OpenAI-compatible endpoints
Use any OpenAI client library. Zero code changes.
/v1/chat/completionsChat Completions
Chat and text completion endpoints
/v1/responsesResponses
Agentic orchestration with tool calling and MCP
/v1/embeddingsEmbeddings
Text embeddings from any provider
/v1/vector_storesVector Stores
Document storage and semantic search
/v1/moderationsModerations
Content moderation and safety shields
/v1/filesFiles
File upload, processing, and extraction
/v1/batchesBatches
Async batch processing at scale
/v1/conversationsConversations
Multi-turn conversation state and history
/v1/modelsModels
Model discovery and management
Anthropic-compatible endpoint
Use the Anthropic client library directly.
/v1/messagesMessages API
Chat completions with native Anthropic format
Llama Stack native APIs
Additional endpoints beyond the OpenAI and Anthropic specs.
/v1/connectorsConnectors
External connectors like MCP servers
/v1/toolsTools
Tool discovery and runtime invocation
How it works
OpenAI and Anthropic compatible, pluggable providers, deploy anywhere.
Plug in any provider
Develop locally with Ollama, deploy to production with vLLM or a managed service.
Inference
Vector Stores
Tools
Open source. Community driven.
Join thousands of developers building with Llama Stack