Skip to main content

Build AI Applications with Llama Stack

Unified APIs for Inference, RAG, Agents, Tools, Safety, and Telemetry

Quick Start

Get up and running with Llama Stack in just a few commands. Build your first RAG application locally.

# Install uv and start Ollama
ollama run llama3.2:3b --keepalive 60m

# Install server dependencies
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install

# Run Llama Stack server
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter

# Try the Python SDK
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
  base_url="http://localhost:8321"
)

response = client.chat.completions.create(
  model="Llama3.2-3B-Instruct",
  messages=[{
    "role": "user",
    "content": "What is machine learning?"
  }]
)

Why Llama Stack?

πŸ”—

Unified APIs

One consistent interface for all your AI needs - inference, safety, agents, and more.

πŸ”„

Provider Flexibility

Swap between providers without code changes. Start local, deploy anywhere.

πŸ›‘οΈ

Production Ready

Built-in safety, monitoring, and evaluation tools for enterprise applications.

πŸ“±

Multi-Platform

SDKs for Python, Node.js, iOS, Android, and REST APIs for any language.

Llama Stack Ecosystem

Complete toolkit for building AI applications with Llama Stack

πŸ› οΈ

SDKs & Clients

Official client libraries for multiple programming languages

πŸš€

Example Applications

Ready-to-run examples to jumpstart your AI projects

☸️

Kubernetes Operator

Deploy and manage Llama Stack on Kubernetes clusters

Join the Community

Connect with developers building the future of AI applications