Skip to main content
OpenAI + Anthropic API Server

Build AI apps with
any model, anywhere

OpenAI and Anthropic compatible API server. Use any client, any framework, any model. Swap providers without changing code.

Try it now, no installation required (requires uv)

uvx --from 'llama-stack[starter]' llama stack run starter
/v1/responses
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.responses.create(
    model="llama-3.3-70b",
    input="Summarize this repository",
    tools=[{"type": "web_search"}],
)

Works with

OllamavLLMOpenAIAnthropicAWS BedrockAzure OpenAIGeminiTogether AIFireworksPGVectorQdrantChromaDBMilvusWeaviateOllamavLLMOpenAIAnthropicAWS BedrockAzure OpenAIGeminiTogether AIFireworksPGVectorQdrantChromaDBMilvusWeaviate
20+Inference Providers
11+API Endpoints
4Client Languages
100%Open Source

OpenAI-compatible endpoints

Use any OpenAI client library. Zero code changes.

/v1/chat/completions

Chat Completions

Chat and text completion endpoints

/v1/responses

Responses

Agentic orchestration with tool calling and MCP

/v1/embeddings

Embeddings

Text embeddings from any provider

/v1/vector_stores

Vector Stores

Document storage and semantic search

/v1/moderations

Moderations

Content moderation and safety shields

/v1/files

Files

File upload, processing, and extraction

/v1/batches

Batches

Async batch processing at scale

/v1/conversations

Conversations

Multi-turn conversation state and history

/v1/models

Models

Model discovery and management

Anthropic-compatible endpoint

Use the Anthropic client library directly.

/v1/messages

Messages API

Chat completions with native Anthropic format

Llama Stack native APIs

Additional endpoints beyond the OpenAI and Anthropic specs.

/v1/connectors

Connectors

External connectors like MCP servers

/v1/tools

Tools

Tool discovery and runtime invocation

How it works

OpenAI and Anthropic compatible, pluggable providers, deploy anywhere.

Llama Stack Architecture

Plug in any provider

Develop locally with Ollama, deploy to production with vLLM or a managed service.

Inference

OllamavLLMAWS BedrockAzure OpenAIOpenAIAnthropicGemini15+ more

Vector Stores

PGVectorQdrantChromaDBMilvusWeaviate4+ more

Tools

MCP ServersWeb SearchFile Search (RAG)PDF / Docling

Open source. Community driven.

Join thousands of developers building with Llama Stack