Quick Start
Get up and running with Llama Stack in just a few commands. Build your first RAG application locally.
# Install uv and start Ollama
ollama run llama3.2:3b --keepalive 60m
# Install server dependencies
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run Llama Stack server
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter
# Try the Python SDK
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url="http://localhost:8321"
)
response = client.chat.completions.create(
model="Llama3.2-3B-Instruct",
messages=[{
"role": "user",
"content": "What is machine learning?"
}]
)
Why Llama Stack?
π
Unified APIs
One consistent interface for all your AI needs - inference, safety, agents, and more.
π
Provider Flexibility
Swap between providers without code changes. Start local, deploy anywhere.
π‘οΈ
Production Ready
Built-in safety, monitoring, and evaluation tools for enterprise applications.
π±
Multi-Platform
SDKs for Python, Node.js, iOS, Android, and REST APIs for any language.
Llama Stack Ecosystem
Complete toolkit for building AI applications with Llama Stack
π οΈ
SDKs & Clients
Official client libraries for multiple programming languages
Join the Community
Connect with developers building the future of AI applications