Anthropic Messages API

Llama Stack natively supports the Anthropic Messages API at /v1/messages. Use the official Anthropic SDK with any model, just point it at your Llama Stack server.

Quick example

Python
TypeScript
curl

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:8321/v1",
    api_key="fake",
)

message = client.messages.create(
    model="llama-3.3-70b",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is Llama Stack?"}
    ],
)

print(message.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://localhost:8321/v1",
  apiKey: "fake",
});

const message = await client.messages.create({
  model: "llama-3.3-70b",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is Llama Stack?" }],
});

console.log(message.content[0].text);

curl http://localhost:8321/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-]api-key: fake" \
  -d '{
    "model": "llama-3.3-70b",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is Llama Stack?"}
    ]
  }'

Streaming

Python
TypeScript

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:8321/v1",
    api_key="fake",
)

with client.messages.stream(
    model="llama-3.3-70b",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain RAG in 3 sentences."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://localhost:8321/v1",
  apiKey: "fake",
});

const stream = client.messages.stream({
  model: "llama-3.3-70b",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain RAG in 3 sentences." }],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

Multi-turn conversation

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:8321/v1",
    api_key="fake",
)

message = client.messages.create(
    model="llama-3.3-70b",
    max_tokens=1024,
    system="You are a helpful coding assistant.",
    messages=[
        {"role": "user", "content": "Write a Python function to reverse a string."},
        {"role": "assistant", "content": "def reverse(s): return s[::-1]"},
        {"role": "user", "content": "Now make it handle unicode properly."},
    ],
)

print(message.content[0].text)

What's supported

Feature	Status
`messages.create`	Supported
`messages.stream`	Supported
System messages	Supported
Multi-turn conversations	Supported
`max_tokens`	Supported
`temperature`	Supported
`top_p`	Supported
`stop_sequences`	Supported
Tool use	Not yet supported
Vision (image inputs)	Not yet supported

API reference

See the auto-generated API reference for the full request/response schema:

Quick example​

Streaming​

Multi-turn conversation​

What's supported​

API reference​

Quick example

Streaming

Multi-turn conversation

What's supported

API reference