API Reference

Llama Stack implements the OpenAI API and organizes endpoints by stability level. Use any OpenAI-compatible client to access these APIs.

Stable APIs

/v1/chat/completions, /v1/completions, /v1/embeddings

Inference

Chat completions, text completions, and embeddings

/v1/models

Models

Model listing and management

/v1/files

Files

File upload and management

/v1/vector_stores

Vector IO

Document storage and semantic search

/v1/batches

Batches

Offline batch processing

/v1/moderations

Safety

Content safety via Llama Guard

/v1/conversations

Conversations

Conversation state management

/v1/prompts

Prompts

Prompt templates and versioning

/v1/scoring

Scoring

Scoring functions for evaluation

Experimental APIs

/v1alpha/admin

Admin

Providers, routes, health, and version

/v1alpha/inference/rerank

Rerank

Document reranking for search relevance

/v1alpha/file_processors

File Processors

Document ingestion and chunking

/v1beta/connectors

Connectors

External tool and service connectors

/v1alpha/eval

Eval

Evaluation pipelines and benchmarks

/v1beta/datasets

Datasets

Dataset management

Deprecated APIs

deprecated

Benchmarks

Legacy benchmark endpoints

deprecated

Datasets

Legacy dataset endpoints

deprecated

Shields

Legacy shield endpoints

These APIs follow semantic versioning and maintain backward compatibility within major versions.