Skip to main content
Version: v0.3.2

Building Custom Distributions

This guide walks you through inspecting existing distributions, customising their configuration, and building runnable artefacts for your own deployment.

Explore existing distributions​

All first-party distributions live under llama_stack/distributions/. Each directory contains:

  • build.yaml – the distribution specification (providers, additional dependencies, optional external provider directories).
  • run.yaml – sample run configuration (when provided).
  • Documentation fragments that power this site.

Browse that folder to understand available providers and copy a distribution to use as a starting point. When creating a new stack, duplicate an existing directory, rename it, and adjust the build.yaml file to match your requirements.

Use the Containerfile at containers/Containerfile, which installs llama-stack, resolves distribution dependencies via llama stack list-deps, and sets the entrypoint to llama stack run.

docker build . \
-f containers/Containerfile \
--build-arg DISTRO_NAME=starter \
--tag llama-stack:starter

Handy build arguments:

  • DISTRO_NAME – distribution directory name (defaults to starter).
  • RUN_CONFIG_PATH – absolute path inside the build context for a run config that should be baked into the image (e.g. /workspace/run.yaml).
  • INSTALL_MODE=editable – install the repository copied into /workspace with uv pip install -e. Pair it with --build-arg LLAMA_STACK_DIR=/workspace.
  • LLAMA_STACK_CLIENT_DIR – optional editable install of the Python client.
  • PYPI_VERSION / TEST_PYPI_VERSION – pin specific releases when not using editable installs.
  • KEEP_WORKSPACE=1 – retain /workspace in the final image if you need to access additional files (such as sample configs or provider bundles).

Make sure any custom build.yaml, run configs, or provider directories you reference are included in the Docker build context so the Containerfile can read them.

Run your stack server​

After building the image, launch it directly with Docker or Podmanβ€”the entrypoint calls llama stack run using the baked distribution or the bundled run config:

docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llama-stack:starter \
--port $LLAMA_STACK_PORT

Here are the docker flags and their uses:

  • -d: Runs the container in the detached mode as a background process

  • -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT: Maps the container port to the host port for accessing the server

  • -v ~/.llama:/root/.llama: Mounts the local .llama directory to persist configurations and data

  • localhost/distribution-ollama:dev: The name and tag of the container image to run

  • -e INFERENCE_MODEL=$INFERENCE_MODEL: Sets the INFERENCE_MODEL environment variable in the container

  • -e OLLAMA_URL=http://host.docker.internal:11434: Sets the OLLAMA_URL environment variable in the container

  • --port $LLAMA_STACK_PORT: Port number for the server to listen on

If you prepared a custom run config, mount it into the container and reference it explicitly:

docker run \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $(pwd)/run.yaml:/app/run.yaml \
llama-stack:starter \
/app/run.yaml