Skip to main content

Anthropic Messages API

Llama Stack natively supports the Anthropic Messages API at /v1/messages. Use the official Anthropic SDK with any model, just point it at your Llama Stack server.

Quick example

from anthropic import Anthropic

client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)

message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is Llama Stack?"}
],
)

print(message.content[0].text)

Streaming

from anthropic import Anthropic

client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)

with client.messages.stream(
model="llama-3.3-70b",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain RAG in 3 sentences."}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

Multi-turn conversation

from anthropic import Anthropic

client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)

message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
system="You are a helpful coding assistant.",
messages=[
{"role": "user", "content": "Write a Python function to reverse a string."},
{"role": "assistant", "content": "def reverse(s): return s[::-1]"},
{"role": "user", "content": "Now make it handle unicode properly."},
],
)

print(message.content[0].text)

What's supported

FeatureStatus
messages.createSupported
messages.streamSupported
System messagesSupported
Multi-turn conversationsSupported
max_tokensSupported
temperatureSupported
top_pSupported
stop_sequencesSupported
Tool useNot yet supported
Vision (image inputs)Not yet supported

API reference

See the auto-generated API reference for the full request/response schema: