llama (server-side) CLI Reference
The llama CLI tool helps you set up and use the Llama Stack. The CLI is available on your path after installing the llama-stack package.
Installation​
You have two ways to install Llama Stack:
-
Install as a package: You can install the repository directly from PyPI by running the following command:
pip install llama-stack -
Install from source: If you prefer to install from the source code, follow these steps:
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
uv venv myenv --python 3.12
source myenv/bin/activate # On Windows: myenv\Scripts\activate
cd llama-stack
pip install -e .
llama subcommands​
stack: Allows you to build a stack using thellama stackdistribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the Build your own Distribution documentation.
For downloading models, we recommend using the Hugging Face CLI. See Downloading models for more information.
Sample Usage​
llama --help
usage: llama [-h] {stack} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{stack}
stack Operations for the Llama Stack / Distributions
Downloading models​
You first need to have models downloaded locally. We recommend using the Hugging Face CLI to download models.
First, install the Hugging Face CLI:
pip install huggingface_hub[cli]
Then authenticate and download models:
# Authenticate with Hugging Face
huggingface-cli login
# Download a model
huggingface-cli download meta-llama/Llama-3.2-3B-Instruct --local-dir ~/.llama/Llama-3.2-3B-Instruct
List the downloaded models​
To list the downloaded models, you can use the Hugging Face CLI:
# List all downloaded models in your local cache
huggingface-cli scan-cache