Version: v0.3.2

Safety Guardrails

Safety is a critical component of any AI application. Llama Stack provides a comprehensive Shield system that can be applied at multiple touchpoints to ensure responsible AI behavior and content moderation.

Shield System Overview

The Shield system in Llama Stack provides:

Content filtering for both input and output messages
Multi-touchpoint protection across your application flow
Configurable safety policies tailored to your use case
Integration with agents for automated safety enforcement

Basic Shield Usage

Registering a Safety Shield

Shield Registration
Manual Safety Check

# Register a safety shield
shield_id = "content_safety"
client.shields.register(
    shield_id=shield_id,
    provider_shield_id="llama-guard-basic"
)

# Run content through shield manually
response = client.safety.run_shield(
    shield_id=shield_id,
    messages=[{"role": "user", "content": "User message here"}]
)

if response.violation:
    print(f"Safety violation detected: {response.violation.user_message}")
    # Handle violation appropriately
else:
    print("Content passed safety checks")

Agent Integration

Shields can be automatically applied to agent interactions for seamless safety enforcement:

Input Shields
Output Shields
Input & Output Shields

from llama_stack_client import Agent

# Create agent with input safety shields
agent = Agent(
    client,
    model="meta-llama/Llama-3.2-3B-Instruct",
    instructions="You are a helpful assistant",
    input_shields=["content_safety"],  # Shield user inputs
    tools=["builtin::websearch"],
)

session_id = agent.create_session("safe_session")

# All user inputs will be automatically screened
response = agent.create_turn(
    messages=[{"role": "user", "content": "Tell me about AI safety"}],
    session_id=session_id,
)

# Create agent with output safety shields
agent = Agent(
    client,
    model="meta-llama/Llama-3.2-3B-Instruct",
    instructions="You are a helpful assistant",
    output_shields=["content_safety"],  # Shield agent outputs
    tools=["builtin::websearch"],
)

session_id = agent.create_session("safe_session")

# All agent responses will be automatically screened
response = agent.create_turn(
    messages=[{"role": "user", "content": "Help me with my research"}],
    session_id=session_id,
)

# Create agent with comprehensive safety coverage
agent = Agent(
    client,
    model="meta-llama/Llama-3.2-3B-Instruct",
    instructions="You are a helpful assistant",
    input_shields=["content_safety"],   # Screen user inputs
    output_shields=["content_safety"],  # Screen agent outputs
    tools=["builtin::websearch"],
)

session_id = agent.create_session("fully_protected_session")

# Both input and output are automatically protected
response = agent.create_turn(
    messages=[{"role": "user", "content": "Research question here"}],
    session_id=session_id,
)

Available Shield Types

Llama Guard Shields

Llama Guard provides state-of-the-art content safety classification:

Basic Llama Guard
Advanced Llama Guard

# Basic Llama Guard for general content safety
client.shields.register(
    shield_id="llama_guard_basic",
    provider_shield_id="llama-guard-basic"
)

Use Cases:

General content moderation
Harmful content detection
Basic safety compliance

# Advanced Llama Guard with custom categories
client.shields.register(
    shield_id="llama_guard_advanced",
    provider_shield_id="llama-guard-advanced",
    config={
        "categories": [
            "violence", "hate_speech", "sexual_content",
            "self_harm", "illegal_activity"
        ],
        "threshold": 0.8
    }
)

Use Cases:

Fine-tuned safety policies
Domain-specific content filtering
Enterprise compliance requirements

Custom Safety Shields

Create domain-specific safety shields for specialized use cases:

# Register custom safety shield
client.shields.register(
    shield_id="financial_compliance",
    provider_shield_id="custom-financial-shield",
    config={
        "detect_pii": True,
        "financial_advice_warning": True,
        "regulatory_compliance": "FINRA"
    }
)

Safety Response Handling

When safety violations are detected, handle them appropriately:

Basic Handling
Advanced Handling

response = client.safety.run_shield(
    shield_id="content_safety",
    messages=[{"role": "user", "content": "Potentially harmful content"}]
)

if response.violation:
    violation = response.violation
    print(f"Violation Type: {violation.violation_type}")
    print(f"User Message: {violation.user_message}")
    print(f"Metadata: {violation.metadata}")

    # Log the violation for audit purposes
    logger.warning(f"Safety violation detected: {violation.violation_type}")

    # Provide appropriate user feedback
    return "I can't help with that request. Please try asking something else."

def handle_safety_response(safety_response, user_message):
    """Advanced safety response handling with logging and user feedback"""

    if not safety_response.violation:
        return {"safe": True, "message": "Content passed safety checks"}

    violation = safety_response.violation

    # Log violation details
    audit_log = {
        "timestamp": datetime.now().isoformat(),
        "violation_type": violation.violation_type,
        "original_message": user_message,
        "shield_response": violation.user_message,
        "metadata": violation.metadata
    }
    logger.warning(f"Safety violation: {audit_log}")

    # Determine appropriate response based on violation type
    if violation.violation_type == "hate_speech":
        user_feedback = "I can't engage with content that contains hate speech. Let's keep our conversation respectful."
    elif violation.violation_type == "violence":
        user_feedback = "I can't provide information that could promote violence. How else can I help you today?"
    else:
        user_feedback = "I can't help with that request. Please try asking something else."

    return {
        "safe": False,
        "user_feedback": user_feedback,
        "violation_details": audit_log
    }

# Usage
safety_result = handle_safety_response(response, user_input)
if not safety_result["safe"]:
    return safety_result["user_feedback"]

Safety Configuration Best Practices

🛡️ Multi-Layer Protection

Use both input and output shields for comprehensive coverage
Combine multiple shield types for different threat categories
Implement fallback mechanisms when shields fail

📊 Monitoring & Auditing

Log all safety violations for compliance and analysis
Monitor false positive rates to tune shield sensitivity
Track safety metrics across different use cases

⚙️ Configuration Management

Use environment-specific safety configurations
Implement A/B testing for shield effectiveness
Regularly update shield models and policies

🔧 Integration Patterns

Integrate shields early in the development process
Test safety measures with adversarial inputs
Provide clear user feedback for violations

Advanced Safety Scenarios

Context-Aware Safety

# Safety shields that consider conversation context
agent = Agent(
    client,
    model="meta-llama/Llama-3.2-3B-Instruct",
    instructions="You are a healthcare assistant",
    input_shields=["medical_safety"],
    output_shields=["medical_safety"],
    # Context helps shields make better decisions
    safety_context={
        "domain": "healthcare",
        "user_type": "patient",
        "compliance_level": "HIPAA"
    }
)

Dynamic Shield Selection

def select_shield_for_user(user_profile):
    """Select appropriate safety shield based on user context"""
    if user_profile.age < 18:
        return "child_safety_shield"
    elif user_profile.context == "enterprise":
        return "enterprise_compliance_shield"
    else:
        return "general_safety_shield"

# Use dynamic shield selection
shield_id = select_shield_for_user(current_user)
response = client.safety.run_shield(
    shield_id=shield_id,
    messages=messages
)

Compliance and Regulations

Industry-Specific Safety

Healthcare (HIPAA)
Financial (FINRA)
Education (COPPA)

# Healthcare-specific safety configuration
client.shields.register(
    shield_id="hipaa_compliance",
    provider_shield_id="healthcare-safety-shield",
    config={
        "detect_phi": True,  # Protected Health Information
        "medical_advice_warning": True,
        "regulatory_framework": "HIPAA"
    }
)

# Financial services safety configuration
client.shields.register(
    shield_id="finra_compliance",
    provider_shield_id="financial-safety-shield",
    config={
        "detect_financial_advice": True,
        "investment_disclaimers": True,
        "regulatory_framework": "FINRA"
    }
)

# Educational platform safety for minors
client.shields.register(
    shield_id="coppa_compliance",
    provider_shield_id="educational-safety-shield",
    config={
        "child_protection": True,
        "educational_content_only": True,
        "regulatory_framework": "COPPA"
    }
)

Agents - Integrating safety shields with intelligent agents
Agent Execution Loop - Understanding safety in the execution flow
Evaluations - Evaluating safety shield effectiveness
Telemetry - Monitoring safety violations and metrics
Llama Guard Documentation - Advanced safety model details

Shield System Overview​

Basic Shield Usage​

Registering a Safety Shield​

Agent Integration​

Available Shield Types​

Llama Guard Shields​

Custom Safety Shields​

Safety Response Handling​

Safety Configuration Best Practices​

🛡️ Multi-Layer Protection​

📊 Monitoring & Auditing​

⚙️ Configuration Management​

🔧 Integration Patterns​

Advanced Safety Scenarios​

Context-Aware Safety​

Dynamic Shield Selection​

Compliance and Regulations​

Industry-Specific Safety​

Related Resources​