Skip to main content
Version: v0.2.23

Safety Guardrails

Safety is a critical component of any AI application. Llama Stack provides a comprehensive Shield system that can be applied at multiple touchpoints to ensure responsible AI behavior and content moderation.

Shield System Overview

The Shield system in Llama Stack provides:

  • Content filtering for both input and output messages
  • Multi-touchpoint protection across your application flow
  • Configurable safety policies tailored to your use case
  • Integration with agents for automated safety enforcement

Basic Shield Usage

Registering a Safety Shield

# Register a safety shield
shield_id = "content_safety"
client.shields.register(
shield_id=shield_id,
provider_shield_id="llama-guard-basic"
)

Agent Integration

Shields can be automatically applied to agent interactions for seamless safety enforcement:

from llama_stack_client import Agent

# Create agent with input safety shields
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a helpful assistant",
input_shields=["content_safety"], # Shield user inputs
tools=["builtin::websearch"],
)

session_id = agent.create_session("safe_session")

# All user inputs will be automatically screened
response = agent.create_turn(
messages=[{"role": "user", "content": "Tell me about AI safety"}],
session_id=session_id,
)

Available Shield Types

Llama Guard Shields

Llama Guard provides state-of-the-art content safety classification:

# Basic Llama Guard for general content safety
client.shields.register(
shield_id="llama_guard_basic",
provider_shield_id="llama-guard-basic"
)

Use Cases:

  • General content moderation
  • Harmful content detection
  • Basic safety compliance

Custom Safety Shields

Create domain-specific safety shields for specialized use cases:

# Register custom safety shield
client.shields.register(
shield_id="financial_compliance",
provider_shield_id="custom-financial-shield",
config={
"detect_pii": True,
"financial_advice_warning": True,
"regulatory_compliance": "FINRA"
}
)

Safety Response Handling

When safety violations are detected, handle them appropriately:

response = client.safety.run_shield(
shield_id="content_safety",
messages=[{"role": "user", "content": "Potentially harmful content"}]
)

if response.violation:
violation = response.violation
print(f"Violation Type: {violation.violation_type}")
print(f"User Message: {violation.user_message}")
print(f"Metadata: {violation.metadata}")

# Log the violation for audit purposes
logger.warning(f"Safety violation detected: {violation.violation_type}")

# Provide appropriate user feedback
return "I can't help with that request. Please try asking something else."

Safety Configuration Best Practices

🛡️ Multi-Layer Protection

  • Use both input and output shields for comprehensive coverage
  • Combine multiple shield types for different threat categories
  • Implement fallback mechanisms when shields fail

📊 Monitoring & Auditing

  • Log all safety violations for compliance and analysis
  • Monitor false positive rates to tune shield sensitivity
  • Track safety metrics across different use cases

⚙️ Configuration Management

  • Use environment-specific safety configurations
  • Implement A/B testing for shield effectiveness
  • Regularly update shield models and policies

🔧 Integration Patterns

  • Integrate shields early in the development process
  • Test safety measures with adversarial inputs
  • Provide clear user feedback for violations

Advanced Safety Scenarios

Context-Aware Safety

# Safety shields that consider conversation context
agent = Agent(
client,
model="meta-llama/Llama-3.2-3B-Instruct",
instructions="You are a healthcare assistant",
input_shields=["medical_safety"],
output_shields=["medical_safety"],
# Context helps shields make better decisions
safety_context={
"domain": "healthcare",
"user_type": "patient",
"compliance_level": "HIPAA"
}
)

Dynamic Shield Selection

def select_shield_for_user(user_profile):
"""Select appropriate safety shield based on user context"""
if user_profile.age < 18:
return "child_safety_shield"
elif user_profile.context == "enterprise":
return "enterprise_compliance_shield"
else:
return "general_safety_shield"

# Use dynamic shield selection
shield_id = select_shield_for_user(current_user)
response = client.safety.run_shield(
shield_id=shield_id,
messages=messages
)

Compliance and Regulations

Industry-Specific Safety

# Healthcare-specific safety configuration
client.shields.register(
shield_id="hipaa_compliance",
provider_shield_id="healthcare-safety-shield",
config={
"detect_phi": True, # Protected Health Information
"medical_advice_warning": True,
"regulatory_framework": "HIPAA"
}
)