Skip to main content
Version: Next

remote::vllm

Description​

Remote vLLM inference provider for connecting to vLLM servers.

Configuration​

FieldTypeRequiredDefaultDescription
allowed_modelslist[str | NoneNoList of models that should be registered with the model registry. If None, all models are allowed.
refresh_models<class 'bool'>NoFalseWhether to refresh models periodically from the provider
api_tokenpydantic.types.SecretStr | NoneNoThe API token
urlstr | NoneNoThe URL for the vLLM model serving endpoint
max_tokens<class 'int'>No4096Maximum number of tokens to generate.
tls_verifybool | strNoTrueWhether to verify TLS certificates. Can be a boolean or a path to a CA certificate file.

Sample Configuration​

url: ${env.VLLM_URL:=}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
api_token: ${env.VLLM_API_TOKEN:=fake}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}