Llama Stack Playground
Experimental Feature
The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
The Llama Stack Playground is a simple interface that aims to:
- Showcase capabilities and concepts of Llama Stack in an interactive environment
- Demo end-to-end application code to help users get started building their own applications
- Provide a UI to help users inspect and understand Llama Stack API providers and resources
Key Features​
Interactive Playground Pages​
The playground provides interactive pages for users to explore Llama Stack API capabilities:
Chatbot Interface​
- Chat
- RAG Chat
Simple Chat Interface
- Chat directly with Llama models through an intuitive interface
- Uses the
/chat/completions
streaming API under the hood - Real-time message streaming for responsive interactions
- Perfect for testing model capabilities and prompt engineering
Document-Aware Conversations
- Upload documents to create memory banks
- Chat with a RAG-enabled agent that can query your documents
- Uses Llama Stack's
/agents
API to create and manage RAG sessions - Ideal for exploring knowledge-enhanced AI applications
Evaluation Interface​
- Scoring Evaluations
- Benchmark Evaluations
Custom Dataset Evaluation
- Upload your own evaluation datasets
- Run evaluations using available scoring functions
- Uses Llama Stack's
/scoring
API for flexible evaluation workflows - Great for testing application performance on custom metrics
Pre-registered Evaluation Tasks
- Evaluate models or agents on pre-defined tasks
- Uses Llama Stack's
/eval
API for comprehensive evaluation - Combines datasets and scoring functions for standardized testing
Setup Requirements: Register evaluation datasets and benchmarks first:
# Register evaluation dataset
llama-stack-client datasets register \
--dataset-id "mmlu" \
--provider-id "huggingface" \
--url "https://huggingface.co/datasets/llamastack/evals" \
--metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
--schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string"}, "chat_completion_input": {"type": "string"}}'
# Register benchmark task
llama-stack-client benchmarks register \
--eval-task-id meta-reference-mmlu \
--provider-id meta-reference \
--dataset-id mmlu \
--scoring-functions basic::regex_parser_multiple_choice_answer
Inspection Interface​
- API Providers
- API Resources
Provider Management
- Inspect available Llama Stack API providers
- View provider configurations and capabilities
- Uses the
/providers
API for real-time provider information - Essential for understanding your deployment's capabilities
Resource Exploration
- Inspect Llama Stack API resources including:
- Models: Available language models
- Datasets: Registered evaluation datasets
- Memory Banks: Vector databases and knowledge stores
- Benchmarks: Evaluation tasks and scoring functions
- Shields: Safety and content moderation tools
- Uses
/<resources>/list
APIs for comprehensive resource visibility - For detailed information about resources, see Core Concepts
Getting Started​
Quick Start Guide​
- Setup
- Usage Tips
1. Start the Llama Stack API Server
# Build and run a distribution (example: together)
llama stack build --distro together --image-type venv
llama stack run together
2. Start the Streamlit UI
# Launch the playground interface
uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
Making the Most of the Playground:
- Start with Chat: Test basic model interactions and prompt engineering
- Explore RAG: Upload sample documents to see knowledge-enhanced responses
- Try Evaluations: Use the scoring interface to understand evaluation metrics
- Inspect Resources: Check what providers and resources are available
- Experiment with Settings: Adjust parameters to see how they affect results
Available Distributions​
The playground works with any Llama Stack distribution. Popular options include:
- Together AI
- Ollama (Local)
- Meta Reference
llama stack build --distro together --image-type venv
llama stack run together
Features:
- Cloud-hosted models
- Fast inference
- Multiple model options
llama stack build --distro ollama --image-type venv
llama stack run ollama
Features:
- Local model execution
- Privacy-focused
- No internet required
llama stack build --distro meta-reference --image-type venv
llama stack run meta-reference
Features:
- Reference implementation
- All API features available
- Best for development
Use Cases & Examples​
Educational Use Cases​
- Learning Llama Stack: Hands-on exploration of API capabilities
- Prompt Engineering: Interactive testing of different prompting strategies
- RAG Experimentation: Understanding how document retrieval affects responses
- Evaluation Understanding: See how different metrics evaluate model performance
Development Use Cases​
- Prototype Testing: Quick validation of application concepts
- API Exploration: Understanding available endpoints and parameters
- Integration Planning: Seeing how different components work together
- Demo Creation: Showcasing Llama Stack capabilities to stakeholders
Research Use Cases​
- Model Comparison: Side-by-side testing of different models
- Evaluation Design: Understanding how scoring functions work
- Safety Testing: Exploring shield effectiveness with different inputs
- Performance Analysis: Measuring model behavior across different scenarios
Best Practices​
🚀 Getting Started​
- Begin with simple chat interactions to understand basic functionality
- Gradually explore more advanced features like RAG and evaluations
- Use the inspection tools to understand your deployment's capabilities
🔧 Development Workflow​
- Use the playground to prototype before writing application code
- Test different parameter settings interactively
- Validate evaluation approaches before implementing them programmatically
📊 Evaluation & Testing​
- Start with simple scoring functions before trying complex evaluations
- Use the playground to understand evaluation results before automation
- Test safety features with various input types
🎯 Production Preparation​
- Use playground insights to inform your production API usage
- Test edge cases and error conditions interactively
- Validate resource configurations before deployment
Related Resources​
- Getting Started Guide - Complete setup and introduction
- Core Concepts - Understanding Llama Stack fundamentals
- Agents - Building intelligent agents
- RAG (Retrieval Augmented Generation) - Knowledge-enhanced applications
- Evaluations - Comprehensive evaluation framework
- API Reference - Complete API documentation