Skip to main content

Anthropic Messages API

Llama Stack provides native support for the Anthropic Messages API at /v1/messages. Point the official Anthropic SDK at your Llama Stack server and use any model.

from anthropic import Anthropic

client = Anthropic(base_url="http://localhost:8321/v1", api_key="fake")
message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)

Implemented endpoints

EndpointMethodDescription
/v1/messagesPOSTCreate a message (streaming and non-streaming)
/v1/messages/count_tokensPOSTCount tokens for a message

Supported features

  • System messages (string or content blocks)
  • Multi-turn conversations
  • Streaming via Server-Sent Events
  • Tool definitions and tool use
  • Extended thinking (thinking blocks)
  • Temperature, top_p, top_k, stop sequences
  • Token counting

Not yet implemented

  • Message Batches (/v1/messages/batches)

For property-level conformance details and missing properties, see the conformance report.