Version: v0.4.0

File Operations Support in Vector Store Providers

Overview

This document provides a comprehensive overview of file operations and Vector Store API support across all available vector store providers in Llama Stack. As of release 0.2.24, the following providers support full file operations integration.

Supported Providers

✅ Full File Operations Support

The following providers support complete file operations integration, including file upload, automatic processing, and search:

Inline Providers (Single Node)

Provider	File Operations	Key Features
FAISS	✅ Full Support	Fast in-memory search, GPU acceleration
SQLite-vec	✅ Full Support	Hybrid search, disk-based storage
Milvus	✅ Full Support	High-performance, scalable indexing

Remote Providers (Hosted)

Provider	File Operations	Key Features
ChromaDB	✅ Full Support	Metadata filtering, persistent storage
Qdrant	✅ Full Support	Payload filtering, advanced search
Weaviate	✅ Full Support	GraphQL interface, schema management
Postgres (PGVector)	✅ Full Support	SQL integration, ACID compliance

🔄 Partial Support

Some providers may support basic vector operations but lack full file operations integration:

Provider	Status	Notes
Meta Reference	🔄 Basic	Core vector operations only

File Operations Features

All supported providers offer the following file operations capabilities:

Core Functionality

File Upload & Processing: Automatic document ingestion and chunking
Vector Storage: Embedding generation and storage
Search & Retrieval: Semantic search with metadata filtering
File Management: List, retrieve, and manage files in vector stores

Advanced Features

Automatic Chunking: Configurable chunk sizes and overlap
Metadata Preservation: File attributes and chunk metadata
Status Tracking: Monitor file processing progress
Error Handling: Comprehensive error reporting and recovery

Implementation Details

File Processing Pipeline

Upload: File uploaded via Files API
Extraction: Text content extracted from various formats
Chunking: Content split into optimal chunks (default: 800 tokens)
Embedding: Chunks converted to vector embeddings
Storage: Vectors stored with metadata in vector database
Indexing: Search index updated for fast retrieval

Supported File Formats

Documents: PDF, DOCX, DOC
Text: TXT, MD, RST
Code: Python, JavaScript, Java, C++, etc.
Data: JSON, CSV, XML
Web: HTML files

Chunking Strategies

Default: 800 tokens with 400 token overlap
Custom: Configurable chunk sizes and overlap
Static: Fixed-size chunks with overlap

Provider-Specific Features

FAISS

Storage: In-memory with optional persistence
Performance: Optimized for speed and GPU acceleration
Use Case: High-performance, memory-constrained environments

SQLite-vec

Storage: Disk-based with SQLite backend
Search: Hybrid vector + keyword search
Use Case: Large document collections, frequent updates

Milvus

Storage: Scalable distributed storage
Indexing: Multiple index types (IVF, HNSW)
Use Case: Production deployments, large-scale applications

ChromaDB

Storage: Persistent storage with metadata
Filtering: Advanced metadata filtering
Use Case: Applications requiring rich metadata

Qdrant

Storage: High-performance vector database
Filtering: Payload-based filtering
Use Case: Real-time applications, complex queries

Weaviate

Storage: GraphQL-native vector database
Schema: Flexible schema management
Use Case: Applications requiring complex data relationships

Postgres (PGVector)

Storage: SQL database with vector extensions
Integration: ACID compliance, existing SQL workflows
Use Case: Applications requiring transactional guarantees

Configuration Examples

Basic Configuration

vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      kvstore:
        type: sqlite
        db_path: ~/.llama/faiss_store.db

With FileResponse Support

vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      kvstore:
        type: sqlite
        db_path: ~/.llama/faiss_store.db

files:
  - provider_id: local-files
    provider_type: inline::localfs
    config:
      storage_dir: ~/.llama/files
      metadata_store:
        type: sqlite
        db_path: ~/.llama/files_metadata.db

Usage Examples

Python Client

from llama_stack import LlamaStackClient

client = LlamaStackClient("http://localhost:8000")

# Create vector store
vector_store = client.vector_stores.create(name="documents")

# Upload and process file
with open("document.pdf", "rb") as f:
    file_info = await client.files.upload(file=f, purpose="assistants")

# Attach to vector store
await client.vector_stores.files.create(
    vector_store_id=vector_store.id, file_id=file_info.id
)

# Search
results = await client.vector_stores.search(
    vector_store_id=vector_store.id, query="What is the main topic?", max_num_results=5
)

cURL Commands

# Upload file
curl -X POST http://localhost:8000/v1/openai/v1/files \
  -F "file=@document.pdf" \
  -F "purpose=assistants"

# Create vector store
curl -X POST http://localhost:8000/v1/openai/v1/vector_stores \
  -H "Content-Type: application/json" \
  -d '{"name": "documents"}'

# Attach file to vector store
curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/files \
  -H "Content-Type: application/json" \
  -d '{"file_id": "file-abc123"}'

# Search vector store
curl -X POST http://localhost:8000/v1/openai/v1/vector_stores/{store_id}/search \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the main topic?", "max_num_results": 5}'

Performance Considerations

Chunk Size Optimization

Small chunks (400-600 tokens): Better precision, more results
Large chunks (800-1200 tokens): Better context, fewer results
Overlap (50%): Maintains context between chunks

Storage Efficiency

FAISS: Fastest, but memory-limited
SQLite-vec: Good balance of performance and storage
Milvus: Scalable, production-ready
Remote providers: Managed, but network-dependent

Search Performance

Vector search: Fastest for semantic queries
Hybrid search: Best accuracy (SQLite-vec only)
Filtered search: Fast with metadata constraints

Troubleshooting

Common Issues

File Processing Failures
- Check file format compatibility
- Verify file size limits
- Review error messages in file status
Search Performance
- Optimize chunk sizes for your use case
- Use filters to narrow search scope
- Monitor vector store metrics
Storage Issues
- Check available disk space
- Verify database permissions
- Monitor memory usage (for in-memory providers)

Monitoring

# Check file processing status
file_status = await client.vector_stores.files.retrieve(
    vector_store_id=vector_store.id, file_id=file_info.id
)

if file_status.status == "failed":
    print(f"Error: {file_status.last_error.message}")

# Monitor vector store health
health = await client.vector_stores.health(vector_store_id=vector_store.id)
print(f"Status: {health.status}")

Best Practices

File Organization: Use descriptive names and organize by purpose
Chunking Strategy: Test different sizes for your specific use case
Metadata: Add relevant attributes for better filtering
Monitoring: Track processing status and search performance
Cleanup: Regularly remove unused files to manage storage

Future Enhancements

Planned improvements for file operations support:

Batch Processing: Process multiple files simultaneously
Advanced Chunking: More sophisticated chunking algorithms
Custom Embeddings: Support for custom embedding models
Real-time Updates: Live file processing and indexing
Multi-format Support: Enhanced file format support

Support and Resources

Documentation: File Operations and Vector Store Integration
API Reference: Files API
Provider Docs: Vector Store Providers
Examples: Getting Started
Community: GitHub Discussions

Overview​

Supported Providers​

✅ Full File Operations Support​

Inline Providers (Single Node)​

Remote Providers (Hosted)​

🔄 Partial Support​

File Operations Features​

Core Functionality​

Advanced Features​

Implementation Details​

File Processing Pipeline​

Supported File Formats​

Chunking Strategies​

Provider-Specific Features​

FAISS​

SQLite-vec​

Milvus​

ChromaDB​

Qdrant​

Weaviate​

Postgres (PGVector)​

Configuration Examples​

Basic Configuration​

With FileResponse Support​

Usage Examples​

Python Client​

cURL Commands​

Performance Considerations​

Chunk Size Optimization​

Storage Efficiency​

Search Performance​

Troubleshooting​

Common Issues​

Monitoring​

Best Practices​

Future Enhancements​

Support and Resources​