Version: v0.4.0

File Operations and Vector Store Integration

Overview

Llama Stack provides seamless integration between the Files API and Vector Store APIs, enabling you to upload documents and automatically process them into searchable vector embeddings. This integration implements file operations following the OpenAI Vector Store Files API specification.

Enhanced Capabilities Beyond OpenAI

While Llama Stack maintains full compatibility with OpenAI's Vector Store API, it provides several additional capabilities that enhance functionality and flexibility:

Embedding Model Specification

Unlike OpenAI's vector stores which use a fixed embedding model, Llama Stack allows you to specify which embedding model to use when creating a vector store:

# Create vector store with specific embedding model
vector_store = client.vector_stores.create(
    name="my_documents",
    extra_body={
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",  # Specify your preferred model
        "embedding_dimension": 384,  # Optional: will be auto-detected if not provided
    }
)

Advanced Search Modes

Llama Stack supports multiple search modes beyond basic vector similarity:

Vector Search: Pure semantic similarity search using embeddings
Keyword Search: Traditional keyword-based search for exact matches
Hybrid Search: Combines both vector and keyword search for optimal results

# Different search modes
results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    search_mode="hybrid",  # or "vector", "keyword"
    max_num_results=5,
)

Flexible Ranking Options

For hybrid search, Llama Stack offers configurable ranking strategies:

RRF (Reciprocal Rank Fusion): Combines rankings with configurable impact factor
Weighted Ranker: Linear combination of vector and keyword scores with adjustable weights

# Custom ranking configuration
results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="neural networks",
    search_mode="hybrid",
    ranking_options={
        "ranker": {"type": "weighted", "alpha": 0.7}  # 70% vector, 30% keyword
    },
)

Provider Selection

Choose from multiple vector store providers based on your specific needs:

Inline Providers: FAISS (fast in-memory), SQLite-vec (disk-based), Milvus (high-performance)
Remote Providers: ChromaDB, Qdrant, Weaviate, Postgres (PGVector), Milvus

# Specify provider when creating vector store
vector_store = client.vector_stores.create(
    name="my_documents",
    extra_body={
        "provider_id": "sqlite-vec",  # Choose your preferred provider
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",  # Optional: specify embedding model
        "embedding_dimension": 384,  # Optional: will be auto-detected if not provided
    }
)

Note: All Llama Stack-specific parameters (provider_id, embedding_model, and embedding_dimension) must be specified in the extra_body parameter when creating vector stores.

How It Works

The file operations work through several key components:

File Upload: Documents are uploaded through the Files API
Automatic Processing: Files are automatically chunked and converted to embeddings
Vector Storage: Chunks are stored in vector databases with metadata
Search & Retrieval: Users can search through processed documents using natural language

Supported Vector Store Providers

The following vector store providers support file operations:

Inline Providers (Single Node)

FAISS: Fast in-memory vector similarity search
SQLite-vec: Disk-based storage with hybrid search capabilities

Remote Providers (Hosted)

ChromaDB: Vector database with metadata filtering
Weaviate: Vector database with GraphQL interface
Postgres (PGVector): Vector extensions for PostgreSQL

Both Inline & Remote Providers

Milvus: High-performance vector database with advanced indexing
Qdrant: Vector similarity search with payload filtering

File Processing Pipeline

1. File Upload

from llama_stack import LlamaStackClient

client = LlamaStackClient("http://localhost:8000")

# Upload a document
with open("document.pdf", "rb") as f:
    file_info = await client.files.upload(file=f, purpose="assistants")

2. Attach to Vector Store

# Create a vector store (uses defaults if configured)
# You can also specify embedding_model, embedding_dimension, and provider_id in extra_body
vector_store = client.vector_stores.create(name="my_documents")

# Attach the file to the vector store
file_attach_response = await client.vector_stores.files.create(
    vector_store_id=vector_store.id, file_id=file_info.id
)

3. Automatic Processing

The system automatically:

Detects the file type and extracts text content
Splits content into chunks (default: 800 tokens with 400 token overlap)
Generates embeddings for each chunk
Stores chunks with metadata in the vector store
Updates file status to "completed"

4. Search and Retrieval

# Search through processed documents
search_results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="What is the main topic discussed?",
    max_num_results=5,
)

# Process results
for result in search_results.data:
    print(f"Score: {result.score}")
    for content in result.content:
        print(f"Content: {content.text}")

Supported File Types

The FileResponse system supports various document formats:

Text Files: .txt, .md, .rst
Documents: .pdf, .docx, .doc
Code: .py, .js, .java, .cpp, etc.
Data: .json, .csv, .xml
Web Content: HTML files

Chunking Strategies

Default Strategy

The default chunking strategy uses:

Max Chunk Size: 800 tokens
Overlap: 400 tokens
Method: Semantic boundary detection

Custom Chunking

You can customize chunking when attaching files:

from llama_stack.apis.vector_io import VectorStoreChunkingStrategy

# Attach file with custom chunking
file_attach_response = await client.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file_info.id,
    chunking_strategy=chunking_strategy,
)

Note: While Llama Stack is OpenAI-compatible, it also supports additional options beyond the standard OpenAI API. When creating vector stores, you can specify custom embedding models and embedding dimensions that will be used when processing chunks from attached files.

File Management

List Files in Vector Store

# List all files in a vector store
files = await client.vector_stores.files.list(vector_store_id=vector_store.id)

for file in files:
    print(f"File: {file.filename}, Status: {file.status}")

File Status Tracking

Files go through several statuses:

in_progress: File is being processed
completed: File successfully processed and searchable
failed: Processing failed (check last_error for details)
cancelled: Processing was cancelled

Retrieve File Content

# Get chunked content from vector store
content_response = await client.vector_stores.files.retrieve_content(
    vector_store_id=vector_store.id, file_id=file_info.id
)

for chunk in content_response.content:
    print(f"Chunk {chunk.metadata.get('chunk_index', 0)}: {chunk.text}")

Vector Store Management

List Vector Stores

Retrieve a paginated list of all vector stores:

# List all vector stores with default pagination
vector_stores = await client.vector_stores.list()

# Custom pagination and ordering
vector_stores = await client.vector_stores.list(
    limit=10,
    order="asc",  # or "desc"
    after="vs_12345678",  # cursor-based pagination
)

for store in vector_stores.data:
    print(f"Store: {store.name}, Files: {store.file_counts.total}")
    print(f"Created: {store.created_at}, Status: {store.status}")

Retrieve Vector Store Details

Get detailed information about a specific vector store:

# Get vector store details
store_details = await client.vector_stores.retrieve(vector_store_id="vs_12345678")

print(f"Name: {store_details.name}")
print(f"Status: {store_details.status}")
print(f"File Counts: {store_details.file_counts}")
print(f"Usage: {store_details.usage_bytes} bytes")
print(f"Created: {store_details.created_at}")
print(f"Metadata: {store_details.metadata}")

Update Vector Store

Modify vector store properties such as name, metadata, or expiration settings:

# Update vector store name and metadata
updated_store = await client.vector_stores.update(
    vector_store_id="vs_12345678",
    name="Updated Document Collection",
    metadata={
        "description": "Updated collection for research",
        "category": "research",
        "version": "2.0",
    },
)

# Set expiration policy
expired_store = await client.vector_stores.update(
    vector_store_id="vs_12345678",
    expires_after={"anchor": "last_active_at", "days": 30},
)

print(f"Updated store: {updated_store.name}")
print(f"Last active: {updated_store.last_active_at}")

Delete Vector Store

Remove a vector store and all its associated data:

# Delete a vector store
delete_response = await client.vector_stores.delete(vector_store_id="vs_12345678")

if delete_response.deleted:
    print(f"Vector store {delete_response.id} successfully deleted")
else:
    print("Failed to delete vector store")

Important Notes:

Deleting a vector store removes all files, chunks, and embeddings
This operation cannot be undone
The underlying vector database is also cleaned up
Consider backing up important data before deletion

Search Capabilities

Vector Search

Pure similarity search using embeddings:

results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    max_num_results=10,
)

Filtered Search

Combine vector search with metadata filtering:

results = await client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="machine learning algorithms",
    filters={"file_type": "pdf", "upload_date": "2024-01-01"},
    max_num_results=10,
)

Hybrid Search

SQLite-vec, pgvector, and Milvus support combining vector and keyword search.

Performance Considerations

Note: For detailed performance optimization strategies, see Performance Considerations in the provider documentation.

Key Points:

Chunk Size: 400-600 tokens for precision, 800-1200 for context
Storage: Choose provider based on your performance needs
Search: Optimize for your specific use case

Error Handling

Note: For comprehensive troubleshooting and error handling, see Troubleshooting in the provider documentation.

Common Issues:

File processing failures (format, size limits)
Search performance optimization
Storage and memory issues

Best Practices

Note: For detailed best practices and recommendations, see Best Practices in the provider documentation.

Key Recommendations:

File organization and naming conventions
Chunking strategy optimization
Metadata and monitoring practices
Regular cleanup and maintenance

Integration Examples

RAG Application

# Build a RAG system with file uploads
async def build_rag_system():
    # Create vector store (uses defaults if configured)
    # You can specify embedding_model, embedding_dimension, and provider_id in extra_body if needed
    vector_store = client.vector_stores.create(name="knowledge_base")

    # Upload and process documents
    documents = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
    for doc in documents:
        with open(doc, "rb") as f:
            file_info = await client.files.create(file=f, purpose="assistants")
            await client.vector_stores.files.create(
                vector_store_id=vector_store.id, file_id=file_info.id
            )

    return vector_store


# Query the RAG system
async def query_rag(vector_store_id, question):
    results = await client.vector_stores.search(
        vector_store_id=vector_store_id, query=question, max_num_results=5
    )
    return results

Document Analysis

# Analyze document content through vector search
async def analyze_document(vector_store_id, file_id):
    # Get document content
    content = await client.vector_stores.files.retrieve_content(
        vector_store_id=vector_store_id, file_id=file_id
    )

    # Search for specific topics
    topics = ["introduction", "methodology", "conclusion"]
    analysis = {}

    for topic in topics:
        results = await client.vector_stores.search(
            vector_store_id=vector_store_id, query=topic, max_num_results=3
        )
        analysis[topic] = results.data

    return analysis

Next Steps

Explore the Files API documentation for detailed API reference
Check Vector Store Providers for specific implementation details
Review Getting Started for quick setup instructions

Overview​

Enhanced Capabilities Beyond OpenAI​

Embedding Model Specification​

Advanced Search Modes​

Flexible Ranking Options​

Provider Selection​

How It Works​

Supported Vector Store Providers​

Inline Providers (Single Node)​

Remote Providers (Hosted)​

Both Inline & Remote Providers​

File Processing Pipeline​

1. File Upload​

2. Attach to Vector Store​

3. Automatic Processing​

4. Search and Retrieval​

Supported File Types​

Chunking Strategies​

Default Strategy​

Custom Chunking​

File Management​

List Files in Vector Store​

File Status Tracking​

Retrieve File Content​

Vector Store Management​

List Vector Stores​

Retrieve Vector Store Details​

Update Vector Store​

Delete Vector Store​

Search Capabilities​

Vector Search​

Filtered Search​

Hybrid Search​

Performance Considerations​

Error Handling​

Best Practices​

Integration Examples​

RAG Application​

Document Analysis​

Next Steps​