Skip to content

Latest commit

 

History

History
1520 lines (1353 loc) · 62.2 KB

File metadata and controls

1520 lines (1353 loc) · 62.2 KB

SingularityLLM Tasks

Recent Major Achievements ✨

Code Quality Milestone (December 2024)

  • 🎯 Credo Strict Mode Enabled: 91 source files analyzed with 0 issues
  • 🧹 Clean Codebase: Fixed all nesting issues (14 functions), complexity issues (6 functions), and TODO comments (5 items)
  • 📈 Enhanced Type System: Added metadata support to LLMResponse and StreamChunk for timing and context data
  • 🔧 Improved Functionality: Better token usage tracking, cost filtering, and context statistics

This represents a significant maturity milestone for the SingularityLLM codebase, ensuring high code quality standards and maintainability for future development.

Completed

Core Infrastructure

  • Unified adapter interface for multiple providers
  • Streaming support with SSE parsing
  • Model listing and management
  • Standardized response format (via SingularityLLM.Types)
  • Configuration injection pattern
  • Comprehensive error handling (via SingularityLLM.Error)
  • Application supervisor for lifecycle management

Provider Adapters

  • Anthropic adapter (Claude 3, Claude 4 models)
  • Local adapter via Bumblebee/Nx
  • OpenAI adapter (GPT-4, GPT-3.5)
  • Ollama adapter (local model support)
  • AWS Bedrock adapter (complete - supports Anthropic, Amazon Titan, Meta Llama, Cohere, AI21, Mistral with full credential chain, streaming, provider-specific formatting)
  • Google Gemini adapter (basic implementation - Pro, Ultra variants)
    • Full API implementation in progress (see Gemini API Implementation section)
  • OpenRouter adapter (300+ models from multiple providers)

Features

  • Integrated cost tracking and calculation
  • Token estimation functionality
  • Context window management
    • Automatic message truncation
    • Multiple truncation strategies (sliding_window, smart)
    • Model-specific context window sizes
    • Context validation and statistics
  • Session management (via SingularityLLM.Session)
    • Message history
    • Token usage tracking
    • JSON persistence
    • Metadata handling
  • Instructor support for structured outputs
    • Ecto schema integration
    • Simple type specs
    • Validation and retry logic
    • JSON extraction from markdown

Local Model Support

  • Model loading/unloading (via SingularityLLM.Local.ModelLoader)
  • EXLA/EMLX configuration (via SingularityLLM.Local.EXLAConfig)
  • Token counting with model tokenizers (via SingularityLLM.Local.TokenCounter)
  • Hardware acceleration detection (Metal, CUDA, ROCm)
  • Optimized inference settings
  • Mixed precision support

Configuration System

  • ConfigProvider behaviour for dependency injection
  • Default provider using Application config
  • Static configuration provider
  • Environment-based configuration
  • External YAML configuration system for model metadata
    • Model pricing, context windows, and capabilities in config/models/*.yml
    • Runtime configuration loading with ETS caching
    • SingularityLLM.ModelConfig module for centralized access
    • Separation of model data from code for easier maintenance
    • Model sync script from LiteLLM
      • Python script to fetch model data from LiteLLM
      • Automatic conversion from JSON to YAML format
      • Synced 1048 models with pricing and capabilities

In Progress

(Currently no tasks in progress)

Recently Completed

Provider Adapter Implementations ✅

  • Mistral AI Adapter

    • OpenAI-compatible API implementation
    • Chat, streaming, and embeddings support
    • Function calling with tools format
    • Model listing from API and config fallback
    • Parameter validation with Mistral-specific restrictions
    • Safe prompt parameter support
    • Comprehensive test suite (14 unit tests, integration tests)
  • Perplexity Adapter

    • Search-augmented language model support
    • OpenAI-compatible base with Perplexity extensions
    • Search modes: news, academic, general
    • Reasoning effort levels: low, medium, high
    • URL return and recency filters
    • Search domain inclusion/exclusion
    • Comprehensive test suite (33 tests)
  • Bumblebee Adapter (Renamed from Local)

    • Complete refactoring from Local to Bumblebee naming
    • Split tests into unit and integration for consistency
    • Updated all references throughout codebase
    • BREAKING: Removed :local alias completely (use :bumblebee instead)
    • Added model configuration in config/models/bumblebee.yml
    • Fixed ModelLoader references

Ollama Adapter Full API Implementation ✅

  • Fixed critical bugs
    • Embeddings endpoint corrected to /api/embed
    • Embeddings request format using input parameter
    • Batch embeddings support
  • Added all missing endpoints
    • /api/generate for non-chat completions with streaming
    • /api/show for model information
    • /api/copy, /api/delete for model management
    • /api/pull, /api/push for model distribution
    • /api/ps for running models, /api/version for version info
  • Added comprehensive parameter support
    • options object with all model-specific settings
    • GPU settings, memory settings, sampling parameters
    • context parameter for stateful conversations
    • keep_alive for model memory management
  • Multimodal support with proper image handling
  • Enhanced response format with timing metadata
  • Structured output support with format parameter

Enhanced Streaming Error Recovery ✅

  • Core streaming recovery infrastructure (via SingularityLLM.StreamRecovery)
    • Save partial responses during streaming
      • Store response chunks with timestamps
      • Track token count of partial response
      • Save request context (messages, model, parameters)
    • Detect interruptions
      • Network errors vs timeouts vs user cancellation
      • Distinguish recoverable vs non-recoverable errors
      • Handle streaming errors gracefully
    • Resume mechanisms
      • resume_stream/2 function to continue from saved state
      • Adjust token count for already-received content
      • Support different resumption strategies:
        • :exact - Continue from exact cutoff
        • :paragraph - Regenerate last paragraph for coherence
        • :summarize - Summarize received content and continue
    • Storage backend
      • In-memory storage for current session
      • Automatic cleanup of old partial responses
      • Handle multiple interrupted responses per session
    • Integration with existing streaming
      • Modified stream_chat/3 to support recovery
      • Added recovery options to streaming config
      • Recovery ID tracking for resumable streams

Request Retry Logic with Exponential Backoff ✅

  • Core retry infrastructure (via SingularityLLM.Retry)
    • Exponential backoff with configurable parameters
    • Jitter support to prevent thundering herd
    • Provider-specific retry policies
    • Configurable retry conditions
    • Circuit breaker pattern (structure defined)
  • Provider-specific implementations
    • OpenAI retry with Retry-After header support
    • Anthropic retry for 529 (overloaded) errors
    • Bedrock retry for AWS throttling exceptions
  • Integration with main API
    • Automatic retry for chat/3 (opt-out available)
    • Configurable retry options per request
    • Logging of retry attempts and outcomes

Function Calling Support ✅

  • Unified function calling interface (via SingularityLLM.FunctionCalling)
    • Provider-agnostic function definitions
    • Automatic format conversion for each provider
    • Function call parsing from responses
    • Parameter validation against schemas
    • Safe function execution with error handling
  • Provider implementations
    • OpenAI function calling format
    • Anthropic tools API format
    • Bedrock tools format
    • Gemini function calling format
  • Integration with main API
    • Functions option in chat/3
    • Automatic provider format conversion
    • Public API for parsing and execution
  • Example implementation (examples/function_calling_example.exs)

Mock Adapter for Testing ✅

  • Full mock adapter implementation (SingularityLLM.Adapters.Mock)
    • Static response configuration
    • Dynamic response handlers
    • Error simulation
    • Request capture and analysis
    • Streaming support
    • Function calling support
  • Testing utilities
    • set_response/1 for static responses
    • set_response_handler/1 for dynamic responses
    • set_error/1 for error simulation
    • get_requests/0 for request analysis
    • reset/0 for test cleanup
  • Integration with retry and recovery
    • Works with retry logic
    • Compatible with stream recovery
  • Documentation and examples
    • Comprehensive test file (ex_llm_mock_test.exs)
    • Testing guide (examples/testing_with_mock.exs)

Model Capability Discovery ✅

  • Comprehensive capability tracking (via SingularityLLM.ModelCapabilities)
    • Feature support detection for all models
    • Context window and output token limits
    • Provider-specific capability details
    • Release and deprecation date tracking
  • Model database
    • OpenAI models (GPT-4, GPT-3.5 variants)
    • Anthropic models (Claude 3/3.5 family)
    • Google Gemini models
    • Local models via Bumblebee
    • Mock model for testing
  • Discovery features
    • Query individual model capabilities
    • Find models by required features
    • Compare models side-by-side
    • Get recommendations based on requirements
    • Group models by capability
  • Public API integration
    • get_model_info/2
    • model_supports?/3
    • find_models_with_features/1
    • compare_models/1
    • recommend_models/1
    • models_by_capability/1
    • list_model_features/0
  • Documentation and testing
    • Comprehensive example (model_capabilities_example.exs)
    • Full test coverage (model_capabilities_test.exs)

Response Caching with TTL ✅

  • Core caching infrastructure (via SingularityLLM.Cache)
    • TTL-based cache expiration
    • Configurable storage backends via behaviour
    • ETS storage backend implementation
    • Cache key generation based on request parameters
    • Selective caching (skip for streaming, functions, etc.)
    • Cache statistics tracking
  • Integration with main API
    • Automatic caching in chat/3 with cache option
    • with_cache/3 wrapper for cache-aware execution
    • Configurable TTL per request
    • Global cache enable/disable
  • Cache management
    • Clear cache functionality
    • Delete specific entries
    • Automatic cleanup of expired entries
    • Cache hit/miss statistics
  • Documentation and testing
    • Comprehensive example (caching_example.exs)
    • Full test coverage (cache_test.exs)
    • Cost savings calculations in examples

Code Quality & Maintainability ✅

  • Credo Strict Mode Implementation
    • Fixed all nesting issues (14 functions across multiple modules)
      • Reduced nesting from 5+ levels to max 3 levels
      • Strategic function extraction and pattern matching
      • Improved readability and maintainability
    • Resolved all cyclomatic complexity issues (6 high-complexity functions)
      • SingularityLLM.Instructor.do_structured_chat (complexity 34 → reduced)
      • SingularityLLM.Instructor.get_provider_config (complexity 19 → reduced)
      • SingularityLLM.Adapters.Bedrock.parse_response (complexity 15 → reduced)
      • SingularityLLM.Adapters.Mock.normalize_response (complexity 14 → reduced)
      • SingularityLLM.Adapters.Shared.ModelUtils.generate_description (complexity 13 → reduced)
      • SingularityLLM.Adapters.Mock.embeddings (complexity 13 → reduced)
    • Fixed all TODO comments in codebase (5 items)
      • Enhanced Types module with metadata fields for LLMResponse and StreamChunk
      • Improved context_stats function with character counting and statistics
      • Implemented token usage extraction in Local adapter with estimates
      • Added cost filtering to model recommendations in ModelCapabilities
      • Fixed syntax errors and compilation issues
    • Enabled Credo strict mode successfully
      • 91 source files analyzed with 0 issues found
      • Comprehensive quality checks enabled
      • Automated code quality enforcement

Embeddings API ✅

  • Core embeddings infrastructure
    • New types: EmbeddingResponse and EmbeddingModel
    • Adapter behaviour extensions for embeddings
    • Unified embeddings interface in main module
  • Provider implementations
    • OpenAI embeddings adapter (text-embedding-3-small/large, ada-002)
    • Mock adapter embeddings support
    • Cost tracking for embedding models
  • Utility functions
    • cosine_similarity/2 for comparing embeddings
    • find_similar/3 for semantic search
    • Batch embedding support
  • Integration features
    • Automatic caching support for embeddings
    • list_embedding_models/1 for discovery
    • Dimension configuration support
  • Documentation and examples
    • Comprehensive example (embeddings_example.exs)
    • Semantic search demonstration
    • Clustering and similarity examples
    • Cost comparison across models

Vision/Multimodal Support ✅

  • Core vision infrastructure (via SingularityLLM.Vision)
    • Extended message types to support image content
    • Image format validation and detection
    • Base64 encoding/decoding utilities
    • Provider-specific formatting
  • Image handling
    • Load images from local files
    • Support for image URLs
    • Multiple image formats (JPEG, PNG, GIF, WebP)
    • Image size validation
  • Provider implementations
    • Anthropic vision support (base64 format)
    • OpenAI vision support (URL and base64)
    • Vision capability detection per model
  • API functions
    • vision_message/3 for easy message creation
    • load_image/2 for file loading
    • supports_vision?/2 for capability checking
    • extract_text_from_image/3 for OCR tasks
    • analyze_images/4 for image analysis
  • Integration features
    • Automatic provider formatting in chat/3
    • Vision content detection
    • Detail level configuration
  • Documentation and examples
    • Comprehensive example (vision_example.exs)
    • Multiple use cases demonstrated
    • Error handling examples

Todo

Priority Overview

Priority 0 - Immediate (Next 2 weeks)

  • Implement comprehensive Gemini API support (TDD approach)
  • Code refactoring for shared behaviors (reduce duplication by ~40%)
  • Debug logging levels
  • Complete any remaining core implementations

Priority 1 - Short Term (Next month)

  • High-demand provider adapters (Mistral AI, Together AI, Cohere, Perplexity)

Priority 2 - Medium Term (Next quarter)

  • Advanced router with cost-based routing and fallbacks
  • Batch processing API
  • Extensible callback system

Priority 3+ - Long Term

  • Additional providers based on demand
  • Advanced features and optimizations

Example App Development (Priority 0)

  • Create comprehensive example_app that demonstrates all library features
  • Migrate existing examples into the unified app:
    • Advanced features (retries, context management, etc.)
    • Caching functionality
    • Embeddings
    • Function calling
    • Local model usage
    • Model capabilities exploration
    • Structured outputs with Instructor
    • Testing with mock adapter
    • Vision/multimodal features
  • Add configuration system for provider selection
  • Use Ollama with Qwen3 8B (IQ4_XS) as default (fast local model)
  • Create interactive CLI menu for feature selection
  • Add comprehensive error handling and user feedback
  • Document setup and usage instructions
  • Remove deprecated individual example files

Gemini API Implementation (Priority 0) - TDD Approach

Phase 1: Core Foundation

[x] 1. Models API (GEMINI-API-01-MODELS.md) ✅
  • Create test/singularity_llm/gemini/models_test.exs
    • Test listing available models
    • Test getting model details
    • Test model capabilities and limits
    • Test error handling for invalid models
  • Implement lib/singularity_llm/gemini/models.ex
    • list_models/1 - List available models
    • get_model/2 - Get specific model details
    • Model struct with all properties
    • Error handling
  • Run tests and ensure they pass
  • Update model registry with Gemini models
[x] 2. Content Generation API (GEMINI-API-02-GENERATING-CONTENT.md) ✅
  • Create test/singularity_llm/gemini/content_test.exs
    • Test basic text generation
    • Test streaming responses
    • Test with system instructions
    • Test with generation config (temperature, top_p, etc.)
    • Test with safety settings
    • Test multimodal inputs (text + images)
    • Test structured output (JSON mode)
    • Test function calling
    • Test error scenarios
  • Implement lib/singularity_llm/gemini/content.ex
    • generate_content/3 - Non-streaming generation
    • stream_generate_content/3 - Streaming generation
    • Request/response structs
    • Generation config handling
    • Safety settings
    • Tool/function definitions
  • Integration with main SingularityLLM adapter pattern
  • Run tests and ensure they pass (validation tests pass, API tests require valid key)
[x] 3. Token Counting API (GEMINI-API-04-TOKENS.md) ✅
  • Create test/singularity_llm/adapters/gemini/tokens_test.exs
    • Test counting tokens for text
    • Test counting tokens for multimodal content
    • Test with different models
    • Test error handling
  • Implement lib/singularity_llm/gemini/tokens.ex
    • count_tokens/3 - Count tokens for content
    • Token count response struct
    • Integration with content generation
  • Run tests and ensure they pass

Phase 2: Advanced Features

[x] 4. Files API (GEMINI-API-05-FILES.md) ✅
  • Create test/singularity_llm/adapters/gemini/files_test.exs
    • Test file upload
    • Test file listing
    • Test file deletion
    • Test file metadata retrieval
    • Test file state transitions
    • Test error handling
  • Implement lib/singularity_llm/gemini/files.ex
    • upload_file/3 - Upload media files (resumable upload)
    • list_files/1 - List uploaded files
    • get_file/2 - Get file metadata
    • delete_file/2 - Delete a file
    • wait_for_file/3 - Wait for file processing
    • File struct and state management
  • Run tests and ensure they pass
[x] 5. Context Caching API (GEMINI-API-06-CACHING.md) ✅
  • Create test/singularity_llm/gemini/caching_test.exs
    • Test creating cached content
    • Test listing cached content
    • Test updating cached content
    • Test deleting cached content
    • Test using cached content in generation
    • Test TTL and expiration
  • Implement lib/singularity_llm/gemini/caching.ex
    • create_cached_content/2 - Create cache entry
    • list_cached_contents/1 - List cache entries
    • get_cached_content/2 - Get cache details
    • update_cached_content/3 - Update cache
    • delete_cached_content/2 - Delete cache
    • Integration with content generation
  • Run tests and ensure they pass
[x] 6. Embeddings API (GEMINI-API-07-EMBEDDING.md) ✅
  • Create test/singularity_llm/gemini/embeddings_test.exs
    • Test text embeddings
    • Test batch embeddings
    • Test different embedding models
    • Test content types (query vs document)
    • Test error handling
  • Implement lib/singularity_llm/gemini/embeddings.ex
    • embed_content/3 - Generate embeddings
    • Batch embedding support
    • Task type configuration
    • Integration with main embeddings interface
  • Run tests and ensure they pass

Phase 3: Live API

[x] 7. Live API (GEMINI-API-03-LIVE-API.md) ✅
  • Add WebSocket client library (Gun)
  • Create test/singularity_llm/adapters/gemini/live_test.exs
    • Test WebSocket connection (URL building and headers)
    • Test message building (setup, client content, realtime input, tool response)
    • Test message parsing (server content, tool calls, transcription, go away)
    • Test validation (setup config, realtime input, generation config)
    • Test struct definitions (all message types)
  • Implement lib/singularity_llm/gemini/live.ex
    • WebSocket client implementation using Gun
    • GenServer-based session management
    • Audio/video/text streaming support
    • Event handling and message parsing
    • Tool execution interface
    • Connection lifecycle management
    • Comprehensive validation and error handling
  • Run tests and ensure they pass (23 tests, 100% pass rate)

Phase 4: Fine-tuning

[x] 8. Fine-tuning API (GEMINI-API-08-TUNING_TUNING.md) ✅
  • Create test/singularity_llm/gemini/tuning_test.exs
    • Test creating tuned models
    • Test listing tuned models
    • Test monitoring tuning jobs
    • Test using tuned models
    • Test hyperparameter configuration
  • Implement lib/singularity_llm/gemini/tuning.ex
    • create_tuned_model/2 - Start tuning job
    • list_tuned_models/1 - List tuned models
    • get_tuned_model/2 - Get tuning details
    • delete_tuned_model/2 - Delete tuned model
    • generate_content/3 - Generate using tuned model
    • stream_generate_content/3 - Stream using tuned model
    • All struct definitions (TunedModel, TuningTask, etc.)
  • Run tests and ensure they pass (unit tests pass, integration tests require valid API key)
[x] 9. Tuning Permissions (GEMINI-API-09-TUNING_PERMISSIONS.md) ✅
  • Create test/singularity_llm/gemini/permissions_test.exs
    • Test creating permissions
    • Test listing permissions
    • Test updating permissions
    • Test deleting permissions
    • Test transfer ownership
  • Implement lib/singularity_llm/gemini/permissions.ex
    • Permission CRUD operations
    • Role management (READER, WRITER, OWNER)
    • Grantee types (USER, GROUP, EVERYONE)
    • Transfer ownership operation
  • Run tests and ensure they pass (unit tests pass, integration tests require OAuth2)
  • Important Note: Permissions API requires OAuth2 authentication, not API keys!

Phase 5: Semantic Retrieval

[x] 10. Question Answering (GEMINI-API-10-SEMANTIC-RETRIEVAL_QUESTION-ANSWERING.md) ✅
  • Create test/singularity_llm/gemini/qa_test.exs
    • Test query API with inline passages
    • Test query API with semantic retriever
    • Test answer generation with different styles
    • Test with different answer styles (abstractive, extractive, verbose)
    • Test temperature control and safety settings
    • Test response parsing and error handling
  • Implement lib/singularity_llm/gemini/qa.ex
    • generate_answer/4 - Semantic search and QA
    • Answer generation config with temperature and safety
    • Grounding with inline passages and semantic retriever
    • Input validation and structured response parsing
    • Support for both API key and OAuth2 authentication
  • Run tests and ensure they pass (unit tests pass, integration tests require valid API key/corpus)
[x] 11. Corpus Management (GEMINI-API-11-SEMANTIC-RETRIEVAL_CORPUS.md) ✅
  • Create test/singularity_llm/gemini/corpus_test.exs
    • Test creating corpora with auto-generated and custom names
    • Test listing corpora with pagination support
    • Test updating corpora (display name changes)
    • Test deleting corpora with force option
    • Test querying corpora with metadata filters
    • Test input validation and error handling
    • Test response parsing for all operations
  • Implement lib/singularity_llm/gemini/corpus.ex
    • Complete CRUD operations (create, list, get, update, delete)
    • Semantic search with query_corpus function
    • Metadata filter system with conditions and operators
    • Input validation for all parameters
    • OAuth2 authentication support (required for corpus operations)
    • Pagination support for listing
    • Structured response parsing
  • Run tests and ensure they pass (unit tests pass, integration tests require OAuth2 token)
[x] 12. Document Management (GEMINI-API-13-SEMANTIC-RETRIEVAL_DOCUMENT.md) ✅
  • Create test/singularity_llm/adapters/gemini/document_test.exs
    • Test creating documents with metadata
    • Test listing documents with pagination
    • Test updating documents with field masks
    • Test deleting documents with force option
    • Test querying documents with semantic search
    • Test custom metadata handling (string, numeric, string list)
    • Test validation and error handling
  • Implement lib/singularity_llm/gemini/document.ex
    • Complete CRUD operations (create, list, get, update, delete)
    • Semantic search with query_document function
    • Custom metadata system with all value types
    • Input validation for all parameters
    • Authentication support (API key and OAuth2)
    • Pagination support for listing
    • Comprehensive struct definitions
  • Run tests and ensure they pass (20 unit tests, 100% pass rate)
[x] 13. Chunk Management (GEMINI-API-12-SEMANTIC-RETRIEVAL_CHUNK.md) ✅
  • Create test/singularity_llm/adapters/gemini/chunk_test.exs
    • Test creating chunks with data and metadata
    • Test listing chunks with pagination
    • Test updating chunks with field masks
    • Test deleting chunks
    • Test batch operations (create, update, delete)
    • Test validation and error handling for all operations
    • Test struct definitions and parsing
  • Implement lib/singularity_llm/gemini/chunk.ex
    • Complete CRUD operations (create, list, get, update, delete)
    • Batch operations (batch_create, batch_update, batch_delete)
    • Custom metadata system with all value types
    • Input validation for all parameters
    • Authentication support (API key and OAuth2)
    • Pagination support for listing
    • Comprehensive struct definitions (Chunk, ChunkData, CustomMetadata, etc.)
  • Run tests and ensure they pass (22 unit tests, 100% pass rate)
[x] 14. Retrieval Permissions (GEMINI-API-14-SEMANTIC-RETRIEVAL_PERMISSIONS.md) ✅
  • Create test/singularity_llm/adapters/gemini/retrieval_permissions_test.exs
    • Test corpus permissions (create, list, get, update, delete)
    • Test permission validation for corpus operations
    • Test role hierarchy (READER, WRITER, OWNER)
    • Test grantee types (USER, GROUP, EVERYONE)
    • Test authentication methods (API key and OAuth2)
    • Test struct definitions and JSON parsing
  • Extend existing lib/singularity_llm/gemini/permissions.ex
    • Corpus permissions already supported (corpora/{corpus} parent format)
    • Complete CRUD operations for corpus permissions
    • Input validation and error handling
    • Support for all grantee types and roles
  • Run tests and ensure they pass (15 unit tests, 9 passing non-integration tests)

Phase 6: Integration

[x] 15. Complete Integration (GEMINI-API-15-ALL-METHODS.md) ✅
  • Create test/singularity_llm/adapters/gemini/integration_test.exs
    • Test end-to-end workflows with adapter
    • Test cross-feature interactions and API modules
    • Test error propagation and handling
    • Test performance characteristics (marked with @tag :performance)
  • Enhance lib/singularity_llm/adapters/gemini.ex
    • Main adapter implementation (chat, streaming, embeddings)
    • Integration with SingularityLLM interfaces (unified API)
    • Feature detection and capabilities via ModelCapabilities
    • Error handling and configuration validation
  • SingularityLLM module already supports Gemini provider
  • All individual API modules tested and working

Missing Core Implementations (Priority 0)

  • Session persistence (save_to_file/load_from_file in SingularityLLM.Session)
  • Function calling argument parsing (parse_arguments in SingularityLLM.FunctionCalling)
  • Model info retrieval (get_model_info in SingularityLLM.ModelCapabilities)
  • Provider capability tracking system
    • Create SingularityLLM.ProviderCapabilities module
    • Track provider-level features:
      • Available endpoints (chat, embeddings, images, audio, etc.)
      • Authentication methods (api_key, oauth, aws_signature, etc.)
      • Streaming support at provider level
      • Cost tracking availability
      • Dynamic model listing support
      • Batch operations support
      • File upload capabilities
      • Rate limiting information
      • Provider metadata (description, docs, status URLs)
    • Provider capability discovery API
    • Integration with ModelCapabilities
    • Capability versioning for API versions (future enhancement)

Core Features - Low Priority Items

  • Context statistics implementation
    • Implement context_stats/1 function in SingularityLLM module
    • Calculate token distribution across messages
    • Provide truncation impact analysis
    • Return statistics about context usage
  • Token usage extraction for local models
    • Extract token usage from Bumblebee/Local adapter responses
    • Add token counting support to Local adapter
    • Integrate with existing usage tracking
  • Cost filtering for model recommendations
    • Implement cost-based filtering in ModelCapabilities.recommend_models/1
    • Add max_cost option to recommendation queries
    • Filter models based on pricing data when available

Ollama Adapter - Remaining Low Priority Items

  • /api/blobs/:digest endpoints for blob management
    • GET /api/blobs/:digest - Check if a blob exists
    • HEAD /api/blobs/:digest - Check blob existence (headers only)
    • POST /api/blobs/:digest - Create a blob
    • Used internally by Ollama for model layer management
  • Parse and expose created_at timestamps in responses
    • Add metadata field to LLMResponse and StreamChunk types
    • Include timing information (total_duration, load_duration, etc.)
    • Preserve model context for stateful conversations

Code Refactoring - Shared Behaviors & Modules (Priority 0)

  • Extract streaming into StreamingCoordinator module
    • Standardize Task/Stream.resource pattern
    • Common SSE parsing and buffering
    • Provider-agnostic chunk handling
    • Error recovery integration
  • Create RequestBuilder shared module
    • Common request body construction
    • Optional parameter handling
    • Provider-specific extensions
  • Implement ModelFetcher behavior
    • Standardize model API fetching
    • Common parse/filter/transform pipeline
    • Integration with ModelLoader
  • Extract VisionFormatter module
    • Provider-specific image formatting
    • Content type detection
    • Base64 encoding utilities
  • Enhance existing shared modules
    • Extend ResponseBuilder for more formats
    • Add provider-specific headers to HTTPClient
    • Unify error response parsing

Advanced Router & Infrastructure (Priority 2)

  • OpenAI-Compatible base adapter for shared implementation
  • Provider detection pattern (provider/model-name syntax)
  • Advanced router with strategies
    • Cost-based routing
    • Automatic fallback chains
    • Model group aliases
    • Least-latency routing
    • Usage-based routing
  • Batch processing API
  • Health checks and circuit breakers

New Provider Adapters (Priority 1)

High Priority Providers

  • Groq adapter (fast inference)
  • XAI adapter (Grok models)
  • Mistral AI adapter (European models)
  • Together AI adapter (cost-effective)
  • Cohere adapter (enterprise, rerank API)
  • Perplexity adapter (search-augmented)

Medium Priority Providers

  • Replicate adapter (marketplace)
  • Databricks adapter
  • Vertex AI adapter (Google Cloud)
  • Azure AI adapter (beyond OpenAI)
  • Fireworks AI adapter
  • DeepInfra adapter

Lower Priority Providers

  • Watsonx adapter (IBM)
  • Sagemaker adapter (AWS)
  • Anyscale adapter
  • vLLM adapter
  • Hugging Face Inference API adapter
  • Baseten adapter
  • DeepSeek adapter

Provider Feature Enhancements (Priority 3)

  • Anthropic cache control headers
  • Vertex AI context caching
  • Bedrock Converse API support
  • Provider-specific error mapping
  • Provider capability detection

Observability & Monitoring (Priority 4)

  • Extensible callback system
  • Telemetry integration for metrics
  • Custom metrics collection
  • Request/response logging with redaction

OpenAI API Enhancements (Priority 1)

Core Chat Completions Missing Features

  • Modern Request Parameters

    • max_completion_tokens (replaces deprecated max_tokens)
    • n parameter for multiple completions (1-128)
    • top_p nucleus sampling parameter
    • frequency_penalty and presence_penalty (-2 to 2)
    • seed parameter for deterministic sampling
    • stop sequences (string or array)
    • service_tier for rate limiting control
  • Response Format & Structured Outputs

    • JSON mode: response_format: {"type": "json_object"}
    • JSON Schema structured outputs with validation
    • Refusal handling in responses
    • logprobs token probabilities in responses
  • Modern Tool/Function Calling

    • Migrate from deprecated functions to modern tools API
    • tool_choice parameter for controlling tool usage
    • Parallel tool calls support
    • Tool calling in streaming responses
  • Advanced Message Content

    • Multiple content parts per message (text + images + audio)
    • File content references
    • Audio content in messages
  • New Model Features

    • Audio output with voice selection
    • Web search integration with web_search_options
    • Reasoning effort control for o1/o3 models
    • Developer role for o1+ models (replaces system for these models)
    • Predicted outputs for faster regeneration
  • Enhanced Usage Tracking

    • Cached tokens, reasoning tokens, audio tokens in usage
    • More detailed cost breakdown

Additional OpenAI APIs (Priority 3)

  • Assistants API (Beta)
    • Create/list/modify assistants
    • Thread management
    • Run management with tool integration
  • Files API
    • File upload for assistants and fine-tuning
    • File management and retrieval
  • Image Generation (DALL-E)
    • Text-to-image generation
    • Image variations and edits
  • Audio API
    • Speech-to-text transcription
    • Text-to-speech generation
    • Audio translation
  • Moderation API
    • Content safety classification
    • Multi-category moderation scores
  • Batch API
    • Async batch processing
    • Cost-effective bulk operations
  • Fine-tuning API
    • Custom model training
    • Job management and monitoring

Additional APIs (Priority 6)

  • Files API for uploads
  • Fine-tuning management API
  • Assistants API
  • Rerank API
  • Audio transcription API
  • Text-to-speech API
  • Image generation API
  • Moderation API

Security & Compliance (Priority 7)

  • Guardrails system
    • PII masking
    • Prompt injection detection
    • Content moderation
    • Secret detection
    • Custom guardrail plugins
  • Request sanitization
  • Response validation

Developer Experience (Priority 0)

  • Debug logging levels
  • Enhanced mock system with patterns
  • Provider comparison tools
  • Migration guides from other libraries

Features (Existing)

  • Fine-tuning management

Advanced Context Management

  • Semantic chunking for better truncation
  • Context compression techniques
  • Dynamic context window adjustment
  • Token budget allocation strategies

Cost & Usage

  • Usage analytics and reporting
  • Cost optimization recommendations
  • Budget alerts and limits
  • Provider cost comparison
  • Token usage predictions

Testing & Quality

  • Mock adapters for testing
  • Integration test suite for each provider
  • Performance benchmarks
  • Load testing for concurrent requests
  • Property-based tests for context management

Documentation

  • Comprehensive adapter implementation guide
  • Provider-specific configuration examples
  • Migration guide from other LLM libraries
  • Best practices for context management
  • Cost optimization strategies

Automatic Test Response Caching Implementation Plan

✅ IMPLEMENTATION COMPLETE!

The automatic test response caching system has been successfully implemented with all core features:

Completed Features:

  • Timestamp-based caching - No version conflicts, natural chronological ordering
  • Automatic interception - Zero configuration required for integration tests
  • Smart cache selection - Multiple fallback strategies (latest_success, latest_any, best_match)
  • TTL management - Configurable expiration with per-test-type overrides
  • Content deduplication - Symlinks for identical responses save disk space
  • Comprehensive monitoring - Hit rates, cost savings, performance metrics
  • Test helpers - Easy cache management functions for tests
  • Mix tasks - Command-line tools for cache operations
  • Full documentation - Usage guide, configuration, best practices
  • Cache metadata tracking - Responses include from_cache metadata flag

Usage:

# Automatic - just tag your tests!
@moduletag :integration  # That's it! Caching is automatic

# Check statistics
mix singularity_llm.cache.stats

# Clear cache
mix singularity_llm.cache.clear

Overview

This document outlines the implementation plan for automatic test response caching in SingularityLLM. The goal is to automatically save every real API response during integration tests for replay in future test runs, reducing API costs and improving test reliability.

Current State Analysis

Existing Caching Infrastructure

SingularityLLM already has a sophisticated caching system with the following components:

  1. SingularityLLM.Cache - Runtime ETS-based caching with optional disk persistence
  2. SingularityLLM.ResponseCache - Disk-based response collection for Mock adapter
  3. SingularityLLM.CachingInterceptor - Higher-level response collection for testing
  4. Mock Adapter Integration - Ability to replay cached responses

Current Limitations for Automatic Test Caching

  • Manual activation required (environment variables/config)
  • No automatic test environment detection
  • Limited integration test scenario organization
  • No cache versioning for API response format changes
  • No selective caching for specific test patterns

Implementation Tasks

Phase 1: Foundation and Configuration (Priority: High)

Task 1.1: Create Test Cache Configuration System

  • File: lib/singularity_llm/test_cache_config.ex
  • Purpose: Centralized configuration for test response caching
  • Features:
    • Automatic detection of test environment (Mix.env() == :test)
    • Integration test detection (:integration tag presence)
    • OAuth2 test detection (:oauth2 tag presence)
    • Configuration hierarchy: environment variables > test config > defaults
  • Configuration Options:
    config :singularity_llm, :test_cache,
      enabled: true,                    # Enable automatic test caching
      auto_detect: true,               # Auto-enable in test environment
      cache_dir: "test/cache",         # Test cache directory
      organization: :by_provider,      # :by_provider, :by_test_module, :by_tag
      cache_integration_tests: true,   # Cache integration test responses
      cache_oauth2_tests: true,        # Cache OAuth2 test responses
      replay_by_default: true,         # Use cached responses by default
      save_on_miss: true,             # Save new responses when cache miss
      ttl: :timer.days(7),            # Cache TTL (7 days default, :infinity to never expire)
      
      # Timestamp-based caching
      timestamp_format: :iso8601,           # Filename timestamp format
      fallback_strategy: :latest_success,   # :latest_success, :latest_any, :best_match
      
      # Retention policy
      max_entries_per_cache: 10,            # Keep max 10 timestamped entries per cache key
      cleanup_older_than: :timer.days(30),  # Delete entries older than 30 days
      compress_older_than: :timer.days(7),  # Compress entries older than 7 days
      
      # Content optimization
      deduplicate_content: true,            # Use symlinks for identical content
      content_hash_algorithm: :sha256       # Hash algorithm for deduplication

Task 1.2: Enhance Test Environment Detection

  • File: lib/singularity_llm/test_cache_detector.ex
  • Purpose: Intelligent detection of test scenarios requiring caching
  • Features:
    • Detect integration tests by examining ExUnit tags
    • Detect OAuth2 tests by examining ExUnit tags and test module names
    • Runtime detection of live API usage vs mocked responses
    • Process-level state tracking for test caching mode
  • Functions:
    def integration_test_running?() :: boolean()
    def oauth2_test_running?() :: boolean()
    def should_cache_responses?() :: boolean()
    def get_current_test_context() :: %{module: atom(), tags: [atom()], name: string()}

Task 1.3: Create Test Cache Storage Backend

  • File: lib/singularity_llm/cache/storage/test_cache.ex
  • Purpose: Specialized storage backend for timestamp-based test response caching
  • Features:
    • Hierarchical organization by provider/test module/scenario
    • Timestamp-based file naming for natural chronological ordering
    • Rich metadata index with content deduplication
    • Fuzzy matching for similar requests across timestamps
    • TTL-based cache expiration and cleanup
    • Smart fallback strategies (latest success, latest any, best match)
  • Storage Structure:
    test/cache/
    ├── integration/                 # Integration tests
    │   ├── anthropic/
    │   │   ├── chat_basic/
    │   │   │   ├── 2024-01-15T10-30-45Z.json    # Timestamped responses
    │   │   │   ├── 2024-01-20T14-22-10Z.json
    │   │   │   ├── 2024-01-22T09-15-33Z.json
    │   │   │   └── index.json                   # Cache index and metadata
    │   │   └── chat_streaming/
    │   ├── openai/
    │   └── gemini/
    └── oauth2/                      # OAuth2 tests
        ├── gemini/
        │   ├── corpus_crud/
        │   │   ├── 2024-01-18T16-45-12Z.json
        │   │   ├── 2024-01-21T11-30-25Z.json
        │   │   └── index.json
        │   └── document_operations/
    

Task 1.4: Implement TTL and Cache Selection System

  • File: lib/singularity_llm/test_cache_ttl.ex
  • Purpose: Handle cache selection and TTL logic for timestamp-based caching
  • Features:
    • Check cache age against configurable TTL across all timestamps
    • Smart selection of best cache entry based on fallback strategy
    • Configurable TTL per test type (integration vs OAuth2)
    • Force refresh options for specific test scenarios
  • Functions:
    def select_cache_entry(cache_dir, ttl, strategy) :: {:ok, timestamp} | {:expired, latest} | :none
    def cache_expired?(timestamp, ttl) :: boolean()
    def get_latest_valid_entry(cache_dir, ttl) :: {:ok, timestamp} | :none
    def get_latest_successful_entry(cache_dir, ttl) :: {:ok, timestamp} | :none
    def force_refresh_for_test?(test_context) :: boolean()
    def calculate_ttl(test_tags, provider) :: non_neg_integer() | :infinity

Task 1.5: Implement Timestamp Management and Cleanup System

  • File: lib/singularity_llm/test_cache_timestamp.ex
  • Purpose: Manage timestamped cache entries and cleanup policies
  • Features:
    • Generate consistent timestamp-based filenames
    • List and sort available timestamps for cache keys
    • Implement retention policies (max entries, max age)
    • Content deduplication using file hashes
    • Automatic cleanup of old timestamps
  • Functions:
    def generate_timestamp_filename() :: String.t()
    def parse_timestamp_from_filename(filename) :: {:ok, DateTime.t()} | :error
    def list_cache_timestamps(cache_dir) :: [DateTime.t()]
    def cleanup_old_entries(cache_dir, max_entries, max_age) :: cleanup_report()
    def deduplicate_content(cache_dir) :: dedup_report()
    def get_content_hash(file_path) :: String.t()

Task 1.6: Enhanced Cache Index with Timestamp Tracking

  • File: lib/singularity_llm/test_cache_index.ex
  • Purpose: Maintain index of timestamped cache entries with metadata
  • Index Structure:
    %CacheIndex{
      # Cache key identification
      cache_key: "anthropic/chat_basic",
      test_context: %{module: "AnthropicIntegrationTest", tags: [:integration]},
      
      # TTL configuration
      ttl: :timer.days(7),
      fallback_strategy: :latest_success,
      
      # Timestamp entries (sorted newest first)
      entries: [
        %{
          timestamp: ~U[2024-01-22 09:15:33Z],
          filename: "2024-01-22T09-15-33Z.json",
          status: :success,           # :success, :error, :timeout
          size: 1024,
          content_hash: "abc123def",  # For deduplication
          response_time_ms: 1250,
          api_version: "2023-06-01",
          cost: %{input: 0.001, output: 0.003, total: 0.004}
        },
        %{
          timestamp: ~U[2024-01-20 14:22:10Z],
          filename: "2024-01-20T14-22-10Z.json", 
          status: :success,
          size: 998,
          content_hash: "abc123def",  # Same hash = duplicate content
          response_time_ms: 980,
          api_version: "2023-06-01",
          cost: %{input: 0.001, output: 0.002, total: 0.003}
        }
      ],
      
      # Usage statistics
      total_requests: 45,
      cache_hits: 43,
      last_accessed: ~U[2024-01-22 12:00:00Z],
      access_count: 45,
      
      # Cleanup tracking
      last_cleanup: ~U[2024-01-20 00:00:00Z],
      cleanup_before: ~U[2024-01-01 00:00:00Z]  # Delete entries before this date
    }

Phase 2: Automatic Response Capture (Priority: High)

Task 2.1: Create Test Response Interceptor

  • File: lib/singularity_llm/test_response_interceptor.ex
  • Purpose: Automatically intercept and cache responses during tests
  • Features:
    • Hook into HTTPClient request/response cycle
    • Automatic cache key generation based on test context
    • Rich metadata capture (timing, test info, provider details)
    • Streaming response reassembly and caching
  • Integration Points:
    • SingularityLLM.Adapters.Shared.HTTPClient
    • SingularityLLM.Cache.with_cache/3
    • ExUnit test lifecycle hooks

Task 2.2: Enhance HTTPClient with Timestamp-Based Test Caching

  • File: lib/singularity_llm/adapters/shared/http_client.ex
  • Purpose: Add timestamp-based test caching support to HTTP client
  • Changes:
    • Add test cache check before making real HTTP requests
    • Select best cache entry based on TTL and fallback strategy
    • Save new responses with timestamp-based filenames
    • Capture and save responses when test caching is enabled
    • Maintain original error handling and retry logic
    • Support for both streaming and non-streaming responses
    • Fallback to older timestamps when fresh requests fail
  • New Functions:
    defp maybe_use_test_cache(url, body, headers, opts)
    defp select_best_cache_entry(cache_dir, ttl, strategy)
    defp save_timestamped_response(request_data, response_data, metadata)
    defp build_test_cache_key(url, body, test_context)
    defp fallback_to_older_timestamp(cache_dir, error)
    defp update_cache_index(cache_dir, new_entry)

Task 2.3: Implement Response Metadata Capture

  • File: lib/singularity_llm/test_response_metadata.ex
  • Purpose: Capture comprehensive metadata for cached responses
  • Metadata Fields:
    %ResponseMetadata{
      # Request Information
      provider: "anthropic",
      endpoint: "/v1/messages",
      method: "POST",
      request_body: %{...},
      request_headers: [...],
      
      # Response Information
      response_body: %{...},
      response_headers: [...],
      status_code: 200,
      response_time_ms: 1245,
      
      # Test Context
      test_module: "SingularityLLM.AnthropicIntegrationTest",
      test_name: "basic chat completion",
      test_tags: [:integration, :anthropic],
      test_pid: "#PID<0.123.45>",
      
      # Caching Information
      cached_at: ~U[2024-01-01 00:00:00Z],
      cache_version: "1.0",
      api_version: "2023-06-01",
      
      # Usage Tracking
      usage: %{input_tokens: 10, output_tokens: 25, total_tokens: 35},
      cost: %{input: 0.0001, output: 0.0005, total: 0.0006}
    }

Phase 3: Intelligent Cache Replay (Priority: High)

Task 3.1: Create Test Cache Matcher

  • File: lib/singularity_llm/test_cache_matcher.ex
  • Purpose: Intelligent matching of requests to cached responses
  • Features:
    • Exact match for identical requests
    • Fuzzy matching for similar requests (configurable tolerance)
    • Content-based matching for different formatting
    • Test context-aware matching
  • Matching Strategies:
    def exact_match(request, cached_requests)
    def fuzzy_match(request, cached_requests, tolerance \\ 0.9)
    def semantic_match(request, cached_requests)
    def context_match(request, cached_requests, test_context)

Task 3.2: Implement Cache-First Request Strategy with Timestamps

  • File: lib/singularity_llm/test_cache_strategy.ex
  • Purpose: Implement cache-first strategy for test requests with timestamp selection
  • Strategy Flow:
    1. Check if test caching is enabled
    2. Generate cache key from request and test context
    3. Load cache index for the cache key
    4. Select best timestamp entry based on strategy:
      • :latest_success: Most recent successful response within TTL
      • :latest_any: Most recent response (success or error) within TTL
      • :best_match: Best matching response considering content similarity
    5. If valid timestamp found: return cached response
    6. If no valid cache or expired: make real request and save with new timestamp
    7. If real request fails: fallback to older timestamps if available
  • Fallback Handling:
    • Graceful degradation when cache is corrupted
    • Fallback to older timestamps when refresh fails
    • Configurable cache miss behavior (fail vs. make real request)
    • Cache warming during test setup
    • Automatic cleanup based on age and count limits

Task 3.3: Add Cache Statistics and Monitoring

  • File: lib/singularity_llm/test_cache_stats.ex
  • Purpose: Track cache performance and cost savings with timestamp-based metrics
  • Features:
    • Cache hit/miss ratios per test suite
    • TTL-based refresh statistics
    • Timestamp fallback usage tracking
    • Cost savings calculations
    • Response time comparisons (cached vs. real)
    • Test suite completion time improvements
    • Storage overhead monitoring with deduplication stats
  • Reporting:
    def print_cache_summary()
    # Output:
    # Test Cache Summary:
    # ==================
    # Total Requests: 150
    # Cache Hits: 130 (86.7%)
    # Cache Misses: 8 (5.3%)
    # TTL Refreshes: 12 (8.0%)
    # Fallback to Older Timestamp: 2 (1.3%)
    # Cost Savings: $2.45
    # Time Savings: 45.2 seconds
    # Storage Used: 15.3 MB (unique: 8.1 MB, duplicates: 7.2 MB)
    # Deduplication Ratio: 47% space saved
    # Total Timestamps: 234
    # Oldest Cache Entry: 3 days ago
    # Average Cache Age: 1.2 days

Phase 4: Test Integration and Configuration (Priority: Medium)

Task 4.1: Update Test Helper Functions

  • File: test/support/test_helpers.ex
  • Purpose: Add test cache helpers and utilities with timestamp-based operations
  • New Functions:
    def with_test_cache(opts \\ [], func)
    def clear_test_cache(scope \\ :all)
    def warm_test_cache(test_module)
    def verify_cache_integrity()
    def force_cache_miss(pattern)
    def force_cache_refresh(pattern)
    def set_test_ttl(test_pattern, ttl)
    def list_cache_timestamps(cache_pattern)
    def restore_cache_timestamp(cache_pattern, timestamp)
    def cleanup_old_timestamps(max_age \\ :timer.days(30))
    def deduplicate_cache_content(cache_pattern \\ :all)
    def get_cache_stats(test_module \\ :all)
    def set_fallback_strategy(test_pattern, strategy)

Task 4.2: Enhance Integration Test Setup

  • Files: All integration test files
  • Purpose: Add automatic test caching to integration tests
  • Changes:
    • Add setup hooks for test cache initialization
    • Configure cache warming for known test scenarios
    • Add cache verification in test teardown
    • Implement cache-aware test ordering

Task 4.3: Update OAuth2 Test Configuration

  • Files: test/singularity_llm/adapters/gemini/*oauth2*_test.exs
  • Purpose: Special handling for OAuth2 test caching
  • Features:
    • OAuth2 token anonymization in cache
    • Request signature generation excluding sensitive data
    • Automatic cache invalidation on token refresh
    • Special handling for time-sensitive operations

Phase 5: TTL Management and Timestamp Cleanup (Priority: Medium)

Task 5.1: Implement Automatic Cache Cleanup Scheduler

  • File: lib/singularity_llm/test_cache_scheduler.ex
  • Purpose: Background process for managing cache TTL and timestamp cleanup
  • Features:
    • Periodic scanning for expired cache entries
    • Proactive refresh of critical cache entries before expiration
    • Automatic timestamp cleanup based on age and count limits
    • Content deduplication across timestamps
    • Configurable cleanup strategies (eager, lazy, manual)
  • Functions:
    def start_scheduler(opts \\ []) :: {:ok, pid()} | {:error, reason}
    def schedule_refresh(cache_pattern, delay) :: :ok
    def run_cleanup_cycle() :: cleanup_report()
    def run_deduplication_cycle() :: dedup_report()
    def refresh_critical_caches() :: refresh_report()

Task 5.2: Enhanced API Version Detection and Compatibility

  • File: lib/singularity_llm/test_cache_api_versioning.ex
  • Purpose: Handle API version changes and timestamp-based fallback strategies
  • Features:
    • API version detection and compatibility checking
    • Timestamp-based fallback when API versions differ
    • Automatic cache refresh when breaking API changes detected
    • Smart selection of compatible timestamps
    • API evolution tracking across timestamps

Task 5.3: Add Cache Compression and Optimization

  • File: lib/singularity_llm/test_cache_optimizer.ex
  • Purpose: Optimize cache storage and performance
  • Features:
    • Response compression for large payloads
    • Cache deduplication for identical responses
    • Periodic cache cleanup and optimization
    • Cache size monitoring and management

Task 5.4: Create Enhanced Cache Management CLI

  • File: lib/mix/tasks/singularity_llm.cache.ex
  • Purpose: Command-line tools for cache management with TTL and timestamps
  • Commands:
    # Basic cache management
    mix singularity_llm.cache.clear                    # Clear all test cache
    mix singularity_llm.cache.stats                    # Show cache statistics with TTL info
    mix singularity_llm.cache.verify                   # Verify cache integrity
    mix singularity_llm.cache.warm --suite oauth2      # Warm cache for test suite
    
    # TTL and refresh management
    mix singularity_llm.cache.refresh --expired        # Refresh all expired cache entries
    mix singularity_llm.cache.refresh --pattern "openai/*"  # Refresh specific pattern
    mix singularity_llm.cache.set-ttl --pattern "oauth2/*" --ttl "1d"  # Set TTL for pattern
    mix singularity_llm.cache.check-expiry             # Show cache entries near expiration
    
    # Timestamp management
    mix singularity_llm.cache.timestamps --list --pattern "anthropic/*"  # List timestamps for pattern
    mix singularity_llm.cache.timestamps --cleanup     # Clean up old timestamps
    mix singularity_llm.cache.timestamps --restore "2024-01-15T10:30:45Z"  # Restore specific timestamp
    mix singularity_llm.cache.deduplicate              # Remove duplicate content across timestamps
    
    # Import/Export with timestamps
    mix singularity_llm.cache.export --format json --include-timestamps  # Export with all timestamps
    mix singularity_llm.cache.import --file cache.json --preserve-timestamps  # Import preserving timestamps
    mix singularity_llm.cache.compress --older-than "7d"  # Compress old timestamps

Phase 6: Documentation and Testing (Priority: Medium)

Task 6.1: Create Test Caching Documentation

  • File: docs/test_caching.md
  • Content:
    • How automatic test caching works
    • Configuration options and best practices
    • Troubleshooting common issues
    • Cost savings and performance benefits
    • Integration with CI/CD pipelines

Task 6.2: Add Test Coverage for Caching System

  • Files: test/singularity_llm/test_cache_*_test.exs
  • Purpose: Comprehensive testing of caching functionality
  • Test Categories:
    • Unit tests for cache components
    • Integration tests for end-to-end caching
    • Performance tests for cache overhead
    • Edge case handling tests

Task 6.3: Update Configuration Documentation

  • Files: README.md, config/config.exs
  • Purpose: Document new test caching configuration options
  • Content:
    • Environment variable documentation
    • Configuration examples for different scenarios
    • Migration guide from manual to automatic caching

Implementation Timeline

Week 1-2: Foundation (Phase 1)

  • Complete Tasks 1.1-1.6
  • Basic test environment detection and configuration
  • TTL system and backup/versioning infrastructure

Week 3-4: Automatic Capture (Phase 2)

  • Complete Tasks 2.1-2.3
  • Core response interception and storage functionality
  • TTL-aware cache checking and refresh logic

Week 5-6: Cache Replay (Phase 3)

  • Complete Tasks 3.1-3.3
  • Intelligent cache matching and replay system with TTL support
  • Version fallback mechanisms

Week 7-8: Integration (Phase 4)

  • Complete Tasks 4.1-4.3
  • Full integration with existing test suites
  • TTL and versioning helper functions

Week 9-10: TTL Management and Cleanup (Phase 5)

  • Complete Tasks 5.1-5.4
  • Automated refresh scheduling and version cleanup
  • Enhanced CLI tools for cache management

Week 11+: Documentation and Testing (Phase 6)

  • Complete Tasks 6.1-6.3
  • Comprehensive documentation and test coverage
  • Performance optimization and final polish

Success Criteria

Primary Goals

  • Integration tests automatically cache responses by default
  • OAuth2 tests work seamlessly with cached responses
  • Cost reduction of >90% for repeated test runs
  • Zero configuration required for basic usage
  • Backward compatibility with existing test infrastructure

Performance Targets

  • Cache hit ratio >95% for repeated test runs
  • Test suite runtime improvement >50% with cache
  • Cache storage overhead <100MB for full test suite
  • Cache lookup time <10ms per request

Quality Assurance

  • All existing tests pass with caching enabled
  • Cache integrity verified with checksum validation
  • Graceful fallback when cache is unavailable
  • Clear error messages for cache-related issues

Risk Mitigation

Technical Risks

  • Cache corruption: Implement checksum validation and automatic cache repair
  • Test flakiness: Ensure cached responses maintain original timing and error patterns
  • Storage requirements: Implement compression and cleanup strategies
  • Integration complexity: Maintain clear separation between caching and core functionality

Process Risks

  • Breaking changes: Comprehensive test coverage and gradual rollout
  • Performance regression: Benchmark cache overhead and optimize hot paths
  • Maintenance burden: Clear documentation and automated cache management

Configuration Examples

Development Environment

# config/test.exs
config :singularity_llm, :test_cache,
  enabled: true,
  auto_detect: true,
  cache_dir: "test/cache",
  replay_by_default: true,
  save_on_miss: true,
  ttl: :timer.days(7),              # Refresh cache weekly
  fallback_strategy: :latest_success,
  max_entries_per_cache: 5,
  deduplicate_content: true

CI Environment

# .github/workflows/test.yml
env:
  EX_LLM_TEST_CACHE_ENABLED: "true"
  EX_LLM_TEST_CACHE_DIR: "/tmp/ex_llm_cache"
  EX_LLM_TEST_CACHE_REPLAY_ONLY: "true"  # Don't make real requests in CI
  EX_LLM_TEST_CACHE_TTL: "0"             # Use any cached response in CI
  EX_LLM_TEST_CACHE_FALLBACK_STRATEGY: "latest_any"  # Use any timestamp if needed

Local Development

# Force cache miss for specific tests
export EX_LLM_TEST_CACHE_FORCE_MISS="AnthropicIntegrationTest"

# Force cache refresh for specific tests (ignores TTL)
export EX_LLM_TEST_CACHE_FORCE_REFRESH="OAuth2Test"

# Set custom TTL for development
export EX_LLM_TEST_CACHE_TTL="3600"  # 1 hour TTL

# Set fallback strategy
export EX_LLM_TEST_CACHE_FALLBACK_STRATEGY="latest_success"  # or latest_any, best_match

# Disable caching for debugging
export EX_LLM_TEST_CACHE_ENABLED="false"

# Use specific timestamp for testing
export EX_LLM_TEST_CACHE_USE_TIMESTAMP="2024-01-15T10:30:45Z"

# Control cleanup behavior
export EX_LLM_TEST_CACHE_MAX_ENTRIES="10"
export EX_LLM_TEST_CACHE_CLEANUP_OLDER_THAN="30d"

This comprehensive plan builds upon SingularityLLM's existing caching infrastructure to provide seamless, automatic test response caching that will significantly reduce API costs and improve test reliability.


Notes

  • The library aims to be the go-to solution for LLM integration in Elixir
  • Focus remains on being a unified, reliable LLM client library
  • All features should work consistently across providers where possible
  • Provider-specific features should be clearly documented
  • Performance and cost efficiency are key priorities
  • Features that belong at the application layer have been moved to docs/DROPPED.md