Skip to main content
Version: Next

Llama Stack Playground

Experimental Feature

The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.

The Llama Stack Playground is a simple interface that aims to:

  • Showcase capabilities and concepts of Llama Stack in an interactive environment
  • Demo end-to-end application code to help users get started building their own applications
  • Provide a UI to help users inspect and understand Llama Stack API providers and resources

Key Features​

Interactive Playground Pages​

The playground provides interactive pages for users to explore Llama Stack API capabilities:

Chatbot Interface​

Simple Chat Interface

  • Chat directly with Llama models through an intuitive interface
  • Uses the /chat/completions streaming API under the hood
  • Real-time message streaming for responsive interactions
  • Perfect for testing model capabilities and prompt engineering

Evaluation Interface​

Custom Dataset Evaluation

  • Upload your own evaluation datasets
  • Run evaluations using available scoring functions
  • Uses Llama Stack's /scoring API for flexible evaluation workflows
  • Great for testing application performance on custom metrics

Inspection Interface​

Provider Management

  • Inspect available Llama Stack API providers
  • View provider configurations and capabilities
  • Uses the /providers API for real-time provider information
  • Essential for understanding your deployment's capabilities

Getting Started​

Quick Start Guide​

1. Start the Llama Stack API Server

# Build and run a distribution (example: together)
llama stack build --distro together --image-type venv
llama stack run together

2. Start the Streamlit UI

# Launch the playground interface
uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py

Available Distributions​

The playground works with any Llama Stack distribution. Popular options include:

llama stack build --distro together --image-type venv
llama stack run together

Features:

  • Cloud-hosted models
  • Fast inference
  • Multiple model options

Use Cases & Examples​

Educational Use Cases​

  • Learning Llama Stack: Hands-on exploration of API capabilities
  • Prompt Engineering: Interactive testing of different prompting strategies
  • RAG Experimentation: Understanding how document retrieval affects responses
  • Evaluation Understanding: See how different metrics evaluate model performance

Development Use Cases​

  • Prototype Testing: Quick validation of application concepts
  • API Exploration: Understanding available endpoints and parameters
  • Integration Planning: Seeing how different components work together
  • Demo Creation: Showcasing Llama Stack capabilities to stakeholders

Research Use Cases​

  • Model Comparison: Side-by-side testing of different models
  • Evaluation Design: Understanding how scoring functions work
  • Safety Testing: Exploring shield effectiveness with different inputs
  • Performance Analysis: Measuring model behavior across different scenarios

Best Practices​

🚀 Getting Started​

  • Begin with simple chat interactions to understand basic functionality
  • Gradually explore more advanced features like RAG and evaluations
  • Use the inspection tools to understand your deployment's capabilities

🔧 Development Workflow​

  • Use the playground to prototype before writing application code
  • Test different parameter settings interactively
  • Validate evaluation approaches before implementing them programmatically

📊 Evaluation & Testing​

  • Start with simple scoring functions before trying complex evaluations
  • Use the playground to understand evaluation results before automation
  • Test safety features with various input types

🎯 Production Preparation​

  • Use playground insights to inform your production API usage
  • Test edge cases and error conditions interactively
  • Validate resource configurations before deployment