Building Custom Distributions
This guide walks you through inspecting existing distributions, customising their configuration, and building runnable artefacts for your own deployment.
Explore existing distributionsβ
All first-party distributions live under llama_stack/distributions/. Each directory contains:
build.yamlβ the distribution specification (providers, additional dependencies, optional external provider directories).run.yamlβ sample run configuration (when provided).- Documentation fragments that power this site.
Browse that folder to understand available providers and copy a distribution to use as a starting point. When creating a new stack, duplicate an existing directory, rename it, and adjust the build.yaml file to match your requirements.
- Building a container
- Building with external providers
Use the Containerfile at containers/Containerfile, which installs llama-stack, resolves distribution dependencies via llama stack list-deps, and sets the entrypoint to llama stack run.
docker build . \
-f containers/Containerfile \
--build-arg DISTRO_NAME=starter \
--tag llama-stack:starter
Handy build arguments:
DISTRO_NAMEβ distribution directory name (defaults tostarter).RUN_CONFIG_PATHβ absolute path inside the build context for a run config that should be baked into the image (e.g./workspace/run.yaml).INSTALL_MODE=editableβ install the repository copied into/workspacewithuv pip install -e. Pair it with--build-arg LLAMA_STACK_DIR=/workspace.LLAMA_STACK_CLIENT_DIRβ optional editable install of the Python client.PYPI_VERSION/TEST_PYPI_VERSIONβ pin specific releases when not using editable installs.KEEP_WORKSPACE=1β retain/workspacein the final image if you need to access additional files (such as sample configs or provider bundles).
Make sure any custom build.yaml, run configs, or provider directories you reference are included in the Docker build context so the Containerfile can read them.
External providers live outside the main repository but can be bundled by pointing external_providers_dir to a directory that contains your provider packages.
- Copy providers into the build context, for example
cp -R path/to/providers providers.d. - Update
build.yamlwith the directory and provider entries. - Adjust run configs to use the in-container path (usually
/.llama/providers.d). Pass--build-arg RUN_CONFIG_PATH=/workspace/run.yamlif you want to bake the config.
Example build.yaml excerpt for a custom Ollama provider:
distribution_spec:
providers:
inference:
- remote::custom_ollama
external_providers_dir: /workspace/providers.d
Inside providers.d/custom_ollama/provider.py, define get_provider_spec() so the CLI can discover dependencies:
from llama_stack.providers.datatypes import ProviderSpec
def get_provider_spec() -> ProviderSpec:
return ProviderSpec(
provider_type="remote::custom_ollama",
module="llama_stack_ollama_provider",
config_class="llama_stack_ollama_provider.config.OllamaImplConfig",
pip_packages=[
"ollama",
"aiohttp",
"llama-stack-provider-ollama",
],
)
Here's an example for a custom Ollama provider:
adapter:
adapter_type: custom_ollama
pip_packages:
- ollama
- aiohttp
- llama-stack-provider-ollama # This is the provider package
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
The pip_packages section lists the Python packages required by the provider, as well as the
provider package itself. The package must be available on PyPI or can be provided from a local
directory or a git repository (git must be installed on the build environment).
For deeper guidance, see the External Providers documentation.
Run your stack serverβ
After building the image, launch it directly with Docker or Podmanβthe entrypoint calls llama stack run using the baked distribution or the bundled run config:
docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llama-stack:starter \
--port $LLAMA_STACK_PORT
Here are the docker flags and their uses:
-
-d: Runs the container in the detached mode as a background process -
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT: Maps the container port to the host port for accessing the server -
-v ~/.llama:/root/.llama: Mounts the local .llama directory to persist configurations and data -
localhost/distribution-ollama:dev: The name and tag of the container image to run -
-e INFERENCE_MODEL=$INFERENCE_MODEL: Sets the INFERENCE_MODEL environment variable in the container -
-e OLLAMA_URL=http://host.docker.internal:11434: Sets the OLLAMA_URL environment variable in the container -
--port $LLAMA_STACK_PORT: Port number for the server to listen on
If you prepared a custom run config, mount it into the container and reference it explicitly:
docker run \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $(pwd)/run.yaml:/app/run.yaml \
llama-stack:starter \
/app/run.yaml