Anthropic Messages API
Llama Stack natively supports the Anthropic Messages API at /v1/messages. Use the official Anthropic SDK with any model, just point it at your Llama Stack server.
Quick example
- Python
- TypeScript
- curl
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)
message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is Llama Stack?"}
],
)
print(message.content[0].text)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "http://localhost:8321/v1",
apiKey: "fake",
});
const message = await client.messages.create({
model: "llama-3.3-70b",
max_tokens: 1024,
messages: [{ role: "user", content: "What is Llama Stack?" }],
});
console.log(message.content[0].text);
curl http://localhost:8321/v1/messages \
-H "Content-Type: application/json" \
-H "x-]api-key: fake" \
-d '{
"model": "llama-3.3-70b",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What is Llama Stack?"}
]
}'
Streaming
- Python
- TypeScript
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)
with client.messages.stream(
model="llama-3.3-70b",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain RAG in 3 sentences."}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "http://localhost:8321/v1",
apiKey: "fake",
});
const stream = client.messages.stream({
model: "llama-3.3-70b",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain RAG in 3 sentences." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
Multi-turn conversation
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:8321/v1",
api_key="fake",
)
message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
system="You are a helpful coding assistant.",
messages=[
{"role": "user", "content": "Write a Python function to reverse a string."},
{"role": "assistant", "content": "def reverse(s): return s[::-1]"},
{"role": "user", "content": "Now make it handle unicode properly."},
],
)
print(message.content[0].text)
What's supported
| Feature | Status |
|---|---|
messages.create | Supported |
messages.stream | Supported |
| System messages | Supported |
| Multi-turn conversations | Supported |
max_tokens | Supported |
temperature | Supported |
top_p | Supported |
stop_sequences | Supported |
| Tool use | Not yet supported |
| Vision (image inputs) | Not yet supported |
API reference
See the auto-generated API reference for the full request/response schema: