API Reference
Llama Stack implements the OpenAI API and organizes endpoints by stability level. Use any OpenAI-compatible client to access these APIs.
Stable APIs
/v1/chat/completions, /v1/completions, /v1/embeddingsInference
Chat completions, text completions, and embeddings
/v1/modelsModels
Model listing and management
/v1/filesFiles
File upload and management
/v1/vector_storesVector IO
Document storage and semantic search
/v1/batchesBatches
Offline batch processing
/v1/moderationsSafety
Content safety via Llama Guard
/v1/conversationsConversations
Conversation state management
/v1/promptsPrompts
Prompt templates and versioning
/v1/scoringScoring
Scoring functions for evaluation
Experimental APIs
/v1alpha/adminAdmin
Providers, routes, health, and version
/v1alpha/inference/rerankRerank
Document reranking for search relevance
/v1alpha/file_processorsFile Processors
Document ingestion and chunking
/v1beta/connectorsConnectors
External tool and service connectors
/v1alpha/evalEval
Evaluation pipelines and benchmarks
/v1beta/datasetsDatasets
Dataset management
These APIs follow semantic versioning and maintain backward compatibility within major versions.