Version: Next

Welcome to Llama Stack

Llama Stack is the open-source framework for building generative AI applications.

Llama 4 is here!

News

Llama Stack is now available! See the release notes for more details.

What is Llama Stack?

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides:

Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
Plugin architecture to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile.
Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment
Multiple developer interfaces like CLI and SDKs for Python, Node, iOS, and Android
Standalone applications as examples for how to build production-grade AI applications with Llama Stack

Our goal is to provide pre-packaged implementations (aka "distributions") which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.

How does Llama Stack work?

Llama Stack consists of a server (with multiple pluggable API providers) and Client SDKs meant to be used in your applications. The server can be run in a variety of environments, including local (inline) development, on-premises, and cloud. The client SDKs are available for Python, Swift, Node, and Kotlin.

Quick Links

Ready to build? Check out the Getting Started Guide to get started.
Want to contribute? See the Contributing Guide.
Explore Example Applications built with Llama Stack.

Rich Ecosystem Support

Llama Stack provides adapters for popular providers across all API categories:

Inference: Meta Reference, Ollama, Fireworks, Together, NVIDIA, vLLM, AWS Bedrock, OpenAI, Anthropic, and more
Vector Databases: FAISS, Chroma, Milvus, Postgres, Weaviate, Qdrant, and others
Safety: Llama Guard, Prompt Guard, Code Scanner, AWS Bedrock
Training & Evaluation: HuggingFace, TorchTune, NVIDIA NEMO

Provider Details

For complete provider compatibility and setup instructions, see our Providers Documentation.

Get Started Today

🚀 Quick Start Guide

📚 Example Apps

⭐ Star on GitHub

What is Llama Stack?​

How does Llama Stack work?​

Quick Links​

Rich Ecosystem Support​

Get Started Today​

What is Llama Stack?

How does Llama Stack work?

Quick Links

Rich Ecosystem Support

Get Started Today