Version: v0.4.3

APIs

A Llama Stack API is described as a collection of REST endpoints following OpenAI API standards. We currently support the following APIs:

Inference: run inference with a LLM
Safety: apply safety policies to the output at a Systems (not only model) level
Agents: run multi-step agentic workflows with LLMs with tool usage, memory (RAG), etc.
DatasetIO: interface with datasets and data loaders
Scoring: evaluate outputs of the system
Eval: generate outputs (via Inference or Agents) and perform scoring
VectorIO: perform operations on vector stores, such as adding documents, searching, and deleting documents
Files: manage file uploads, storage, and retrieval
Post Training: fine-tune a model
Tool Runtime: interact with various tools and protocols
Responses: generate responses from an LLM

We are working on adding a few more APIs to complete the application lifecycle. These will include:

Batch Inference: run inference on a dataset of inputs
Batch Agents: run agents on a dataset of inputs
Batches: OpenAI-compatible batch management for inference

OpenAI API Compatibility

We are working on adding OpenAI API compatibility to Llama Stack. This will allow you to use Llama Stack with OpenAI API clients and tools.

File Operations and Vector Store Integration

The Files API and Vector Store APIs work together through file operations, enabling automatic document processing and search. This integration implements the OpenAI Vector Store Files API specification and allows you to:

Upload documents through the Files API
Automatically process and chunk documents into searchable vectors
Store processed content in vector databases based on the availability of our providers
Search through documents using natural language queries For detailed information about this integration, see File Operations and Vector Store Integration.

OpenAI API Compatibility​

File Operations and Vector Store Integration​

OpenAI API Compatibility

File Operations and Vector Store Integration