Anthropic Messages API
Llama Stack provides native support for the Anthropic Messages API at /v1/messages. Point the official Anthropic SDK at your Llama Stack server and use any model.
from anthropic import Anthropic
client = Anthropic(base_url="http://localhost:8321/v1", api_key="fake")
message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
Implemented endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/messages | POST | Create a message (streaming and non-streaming) |
/v1/messages/count_tokens | POST | Count tokens for a message |
Supported features
- System messages (string or content blocks)
- Multi-turn conversations
- Streaming via Server-Sent Events
- Tool definitions and tool use
- Extended thinking (thinking blocks)
- Temperature, top_p, top_k, stop sequences
- Token counting
Not yet implemented
- Message Batches (
/v1/messages/batches)
For property-level conformance details and missing properties, see the conformance report.