Version: Next

Llama Stack Playground

Experimental Feature

The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.

The Llama Stack Playground is a simple interface that aims to:

Showcase capabilities and concepts of Llama Stack in an interactive environment
Demo end-to-end application code to help users get started building their own applications
Provide a UI to help users inspect and understand Llama Stack API providers and resources

Key Features

Interactive Playground Pages

The playground provides interactive pages for users to explore Llama Stack API capabilities:

Chatbot Interface

Chat
RAG Chat

Simple Chat Interface

Chat directly with Llama models through an intuitive interface
Uses the /chat/completions streaming API under the hood
Real-time message streaming for responsive interactions
Perfect for testing model capabilities and prompt engineering

Document-Aware Conversations

Upload documents to create memory banks
Chat with a RAG-enabled agent that can query your documents
Uses Llama Stack's /agents API to create and manage RAG sessions
Ideal for exploring knowledge-enhanced AI applications

Evaluation Interface

Scoring Evaluations
Benchmark Evaluations

Custom Dataset Evaluation

Upload your own evaluation datasets
Run evaluations using available scoring functions
Uses Llama Stack's /scoring API for flexible evaluation workflows
Great for testing application performance on custom metrics

Pre-registered Evaluation Tasks

Evaluate models or agents on pre-defined tasks
Uses Llama Stack's /eval API for comprehensive evaluation
Combines datasets and scoring functions for standardized testing

Setup Requirements: Register evaluation datasets and benchmarks first:

# Register evaluation dataset
llama-stack-client datasets register \
  --dataset-id "mmlu" \
  --provider-id "huggingface" \
  --url "https://huggingface.co/datasets/llamastack/evals" \
  --metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
  --schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string"}, "chat_completion_input": {"type": "string"}}'

# Register benchmark task
llama-stack-client benchmarks register \
  --eval-task-id meta-reference-mmlu \
  --provider-id meta-reference \
  --dataset-id mmlu \
  --scoring-functions basic::regex_parser_multiple_choice_answer

Inspection Interface

API Providers
API Resources

Provider Management

Inspect available Llama Stack API providers
View provider configurations and capabilities
Uses the /providers API for real-time provider information
Essential for understanding your deployment's capabilities

Resource Exploration

Inspect Llama Stack API resources including:
- Models: Available language models
- Datasets: Registered evaluation datasets
- Memory Banks: Vector databases and knowledge stores
- Benchmarks: Evaluation tasks and scoring functions
- Shields: Safety and content moderation tools
Uses /<resources>/list APIs for comprehensive resource visibility
For detailed information about resources, see Core Concepts

Getting Started

Quick Start Guide

Setup
Usage Tips

1. Start the Llama Stack API Server

llama stack list-deps together | xargs -L1 uv pip install
llama stack run together

2. Start the Streamlit UI

# Launch the playground interface
uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py

Available Distributions

The playground works with any Llama Stack distribution. Popular options include:

Together AI
Ollama (Local)
Meta Reference

llama stack list-deps together | xargs -L1 uv pip install
llama stack run together

Features:

Cloud-hosted models
Fast inference
Multiple model options

llama stack list-deps ollama | xargs -L1 uv pip install
llama stack run ollama

Features:

Local model execution
Privacy-focused
No internet required

llama stack list-deps meta-reference | xargs -L1 uv pip install
llama stack run meta-reference

Features:

Reference implementation
All API features available
Best for development

Use Cases & Examples

Educational Use Cases

Learning Llama Stack: Hands-on exploration of API capabilities
Prompt Engineering: Interactive testing of different prompting strategies
RAG Experimentation: Understanding how document retrieval affects responses
Evaluation Understanding: See how different metrics evaluate model performance

Development Use Cases

Prototype Testing: Quick validation of application concepts
API Exploration: Understanding available endpoints and parameters
Integration Planning: Seeing how different components work together
Demo Creation: Showcasing Llama Stack capabilities to stakeholders

Research Use Cases

Model Comparison: Side-by-side testing of different models
Evaluation Design: Understanding how scoring functions work
Safety Testing: Exploring shield effectiveness with different inputs
Performance Analysis: Measuring model behavior across different scenarios

Best Practices

🚀 Getting Started

Begin with simple chat interactions to understand basic functionality
Gradually explore more advanced features like RAG and evaluations
Use the inspection tools to understand your deployment's capabilities

🔧 Development Workflow

Use the playground to prototype before writing application code
Test different parameter settings interactively
Validate evaluation approaches before implementing them programmatically

📊 Evaluation & Testing

Start with simple scoring functions before trying complex evaluations
Use the playground to understand evaluation results before automation
Test safety features with various input types

🎯 Production Preparation

Use playground insights to inform your production API usage
Test edge cases and error conditions interactively
Validate resource configurations before deployment

Getting Started Guide - Complete setup and introduction
Core Concepts - Understanding Llama Stack fundamentals
Agents - Building intelligent agents
RAG (Retrieval Augmented Generation) - Knowledge-enhanced applications
Evaluations - Comprehensive evaluation framework
API Reference - Complete API documentation

Key Features​

Interactive Playground Pages​

Chatbot Interface​

Evaluation Interface​

Inspection Interface​

Getting Started​

Quick Start Guide​

Available Distributions​

Use Cases & Examples​

Educational Use Cases​

Development Use Cases​

Research Use Cases​

Best Practices​

🚀 Getting Started​

🔧 Development Workflow​

📊 Evaluation & Testing​

🎯 Production Preparation​

Related Resources​

Key Features

Interactive Playground Pages

Chatbot Interface

Evaluation Interface

Inspection Interface

Getting Started

Quick Start Guide

Available Distributions

Use Cases & Examples

Educational Use Cases

Development Use Cases

Research Use Cases

Best Practices

🚀 Getting Started

🔧 Development Workflow

📊 Evaluation & Testing

🎯 Production Preparation

Related Resources