- 🎯 Credo Strict Mode Enabled: 91 source files analyzed with 0 issues
- 🧹 Clean Codebase: Fixed all nesting issues (14 functions), complexity issues (6 functions), and TODO comments (5 items)
- 📈 Enhanced Type System: Added metadata support to LLMResponse and StreamChunk for timing and context data
- 🔧 Improved Functionality: Better token usage tracking, cost filtering, and context statistics
This represents a significant maturity milestone for the SingularityLLM codebase, ensuring high code quality standards and maintainability for future development.
- Unified adapter interface for multiple providers
- Streaming support with SSE parsing
- Model listing and management
- Standardized response format (via SingularityLLM.Types)
- Configuration injection pattern
- Comprehensive error handling (via SingularityLLM.Error)
- Application supervisor for lifecycle management
- Anthropic adapter (Claude 3, Claude 4 models)
- Local adapter via Bumblebee/Nx
- OpenAI adapter (GPT-4, GPT-3.5)
- Ollama adapter (local model support)
- AWS Bedrock adapter (complete - supports Anthropic, Amazon Titan, Meta Llama, Cohere, AI21, Mistral with full credential chain, streaming, provider-specific formatting)
- Google Gemini adapter (basic implementation - Pro, Ultra variants)
- Full API implementation in progress (see Gemini API Implementation section)
- OpenRouter adapter (300+ models from multiple providers)
- Integrated cost tracking and calculation
- Token estimation functionality
- Context window management
- Automatic message truncation
- Multiple truncation strategies (sliding_window, smart)
- Model-specific context window sizes
- Context validation and statistics
- Session management (via SingularityLLM.Session)
- Message history
- Token usage tracking
- JSON persistence
- Metadata handling
- Instructor support for structured outputs
- Ecto schema integration
- Simple type specs
- Validation and retry logic
- JSON extraction from markdown
- Model loading/unloading (via SingularityLLM.Local.ModelLoader)
- EXLA/EMLX configuration (via SingularityLLM.Local.EXLAConfig)
- Token counting with model tokenizers (via SingularityLLM.Local.TokenCounter)
- Hardware acceleration detection (Metal, CUDA, ROCm)
- Optimized inference settings
- Mixed precision support
- ConfigProvider behaviour for dependency injection
- Default provider using Application config
- Static configuration provider
- Environment-based configuration
- External YAML configuration system for model metadata
- Model pricing, context windows, and capabilities in config/models/*.yml
- Runtime configuration loading with ETS caching
- SingularityLLM.ModelConfig module for centralized access
- Separation of model data from code for easier maintenance
- Model sync script from LiteLLM
- Python script to fetch model data from LiteLLM
- Automatic conversion from JSON to YAML format
- Synced 1048 models with pricing and capabilities
(Currently no tasks in progress)
-
Mistral AI Adapter
- OpenAI-compatible API implementation
- Chat, streaming, and embeddings support
- Function calling with tools format
- Model listing from API and config fallback
- Parameter validation with Mistral-specific restrictions
- Safe prompt parameter support
- Comprehensive test suite (14 unit tests, integration tests)
-
Perplexity Adapter
- Search-augmented language model support
- OpenAI-compatible base with Perplexity extensions
- Search modes: news, academic, general
- Reasoning effort levels: low, medium, high
- URL return and recency filters
- Search domain inclusion/exclusion
- Comprehensive test suite (33 tests)
-
Bumblebee Adapter (Renamed from Local)
- Complete refactoring from Local to Bumblebee naming
- Split tests into unit and integration for consistency
- Updated all references throughout codebase
- BREAKING: Removed :local alias completely (use :bumblebee instead)
- Added model configuration in config/models/bumblebee.yml
- Fixed ModelLoader references
- Fixed critical bugs
- Embeddings endpoint corrected to
/api/embed - Embeddings request format using
inputparameter - Batch embeddings support
- Embeddings endpoint corrected to
- Added all missing endpoints
-
/api/generatefor non-chat completions with streaming -
/api/showfor model information -
/api/copy,/api/deletefor model management -
/api/pull,/api/pushfor model distribution -
/api/psfor running models,/api/versionfor version info
-
- Added comprehensive parameter support
-
optionsobject with all model-specific settings - GPU settings, memory settings, sampling parameters
-
contextparameter for stateful conversations -
keep_alivefor model memory management
-
- Multimodal support with proper image handling
- Enhanced response format with timing metadata
- Structured output support with format parameter
- Core streaming recovery infrastructure (via SingularityLLM.StreamRecovery)
- Save partial responses during streaming
- Store response chunks with timestamps
- Track token count of partial response
- Save request context (messages, model, parameters)
- Detect interruptions
- Network errors vs timeouts vs user cancellation
- Distinguish recoverable vs non-recoverable errors
- Handle streaming errors gracefully
- Resume mechanisms
-
resume_stream/2function to continue from saved state - Adjust token count for already-received content
- Support different resumption strategies:
-
:exact- Continue from exact cutoff -
:paragraph- Regenerate last paragraph for coherence -
:summarize- Summarize received content and continue
-
-
- Storage backend
- In-memory storage for current session
- Automatic cleanup of old partial responses
- Handle multiple interrupted responses per session
- Integration with existing streaming
- Modified
stream_chat/3to support recovery - Added recovery options to streaming config
- Recovery ID tracking for resumable streams
- Modified
- Save partial responses during streaming
- Core retry infrastructure (via SingularityLLM.Retry)
- Exponential backoff with configurable parameters
- Jitter support to prevent thundering herd
- Provider-specific retry policies
- Configurable retry conditions
- Circuit breaker pattern (structure defined)
- Provider-specific implementations
- OpenAI retry with Retry-After header support
- Anthropic retry for 529 (overloaded) errors
- Bedrock retry for AWS throttling exceptions
- Integration with main API
- Automatic retry for chat/3 (opt-out available)
- Configurable retry options per request
- Logging of retry attempts and outcomes
- Unified function calling interface (via SingularityLLM.FunctionCalling)
- Provider-agnostic function definitions
- Automatic format conversion for each provider
- Function call parsing from responses
- Parameter validation against schemas
- Safe function execution with error handling
- Provider implementations
- OpenAI function calling format
- Anthropic tools API format
- Bedrock tools format
- Gemini function calling format
- Integration with main API
- Functions option in chat/3
- Automatic provider format conversion
- Public API for parsing and execution
- Example implementation (examples/function_calling_example.exs)
- Full mock adapter implementation (SingularityLLM.Adapters.Mock)
- Static response configuration
- Dynamic response handlers
- Error simulation
- Request capture and analysis
- Streaming support
- Function calling support
- Testing utilities
- set_response/1 for static responses
- set_response_handler/1 for dynamic responses
- set_error/1 for error simulation
- get_requests/0 for request analysis
- reset/0 for test cleanup
- Integration with retry and recovery
- Works with retry logic
- Compatible with stream recovery
- Documentation and examples
- Comprehensive test file (ex_llm_mock_test.exs)
- Testing guide (examples/testing_with_mock.exs)
- Comprehensive capability tracking (via SingularityLLM.ModelCapabilities)
- Feature support detection for all models
- Context window and output token limits
- Provider-specific capability details
- Release and deprecation date tracking
- Model database
- OpenAI models (GPT-4, GPT-3.5 variants)
- Anthropic models (Claude 3/3.5 family)
- Google Gemini models
- Local models via Bumblebee
- Mock model for testing
- Discovery features
- Query individual model capabilities
- Find models by required features
- Compare models side-by-side
- Get recommendations based on requirements
- Group models by capability
- Public API integration
- get_model_info/2
- model_supports?/3
- find_models_with_features/1
- compare_models/1
- recommend_models/1
- models_by_capability/1
- list_model_features/0
- Documentation and testing
- Comprehensive example (model_capabilities_example.exs)
- Full test coverage (model_capabilities_test.exs)
- Core caching infrastructure (via SingularityLLM.Cache)
- TTL-based cache expiration
- Configurable storage backends via behaviour
- ETS storage backend implementation
- Cache key generation based on request parameters
- Selective caching (skip for streaming, functions, etc.)
- Cache statistics tracking
- Integration with main API
- Automatic caching in chat/3 with cache option
- with_cache/3 wrapper for cache-aware execution
- Configurable TTL per request
- Global cache enable/disable
- Cache management
- Clear cache functionality
- Delete specific entries
- Automatic cleanup of expired entries
- Cache hit/miss statistics
- Documentation and testing
- Comprehensive example (caching_example.exs)
- Full test coverage (cache_test.exs)
- Cost savings calculations in examples
- Credo Strict Mode Implementation
- Fixed all nesting issues (14 functions across multiple modules)
- Reduced nesting from 5+ levels to max 3 levels
- Strategic function extraction and pattern matching
- Improved readability and maintainability
- Resolved all cyclomatic complexity issues (6 high-complexity functions)
- SingularityLLM.Instructor.do_structured_chat (complexity 34 → reduced)
- SingularityLLM.Instructor.get_provider_config (complexity 19 → reduced)
- SingularityLLM.Adapters.Bedrock.parse_response (complexity 15 → reduced)
- SingularityLLM.Adapters.Mock.normalize_response (complexity 14 → reduced)
- SingularityLLM.Adapters.Shared.ModelUtils.generate_description (complexity 13 → reduced)
- SingularityLLM.Adapters.Mock.embeddings (complexity 13 → reduced)
- Fixed all TODO comments in codebase (5 items)
- Enhanced Types module with metadata fields for LLMResponse and StreamChunk
- Improved context_stats function with character counting and statistics
- Implemented token usage extraction in Local adapter with estimates
- Added cost filtering to model recommendations in ModelCapabilities
- Fixed syntax errors and compilation issues
- Enabled Credo strict mode successfully
- 91 source files analyzed with 0 issues found
- Comprehensive quality checks enabled
- Automated code quality enforcement
- Fixed all nesting issues (14 functions across multiple modules)
- Core embeddings infrastructure
- New types: EmbeddingResponse and EmbeddingModel
- Adapter behaviour extensions for embeddings
- Unified embeddings interface in main module
- Provider implementations
- OpenAI embeddings adapter (text-embedding-3-small/large, ada-002)
- Mock adapter embeddings support
- Cost tracking for embedding models
- Utility functions
- cosine_similarity/2 for comparing embeddings
- find_similar/3 for semantic search
- Batch embedding support
- Integration features
- Automatic caching support for embeddings
- list_embedding_models/1 for discovery
- Dimension configuration support
- Documentation and examples
- Comprehensive example (embeddings_example.exs)
- Semantic search demonstration
- Clustering and similarity examples
- Cost comparison across models
- Core vision infrastructure (via SingularityLLM.Vision)
- Extended message types to support image content
- Image format validation and detection
- Base64 encoding/decoding utilities
- Provider-specific formatting
- Image handling
- Load images from local files
- Support for image URLs
- Multiple image formats (JPEG, PNG, GIF, WebP)
- Image size validation
- Provider implementations
- Anthropic vision support (base64 format)
- OpenAI vision support (URL and base64)
- Vision capability detection per model
- API functions
- vision_message/3 for easy message creation
- load_image/2 for file loading
- supports_vision?/2 for capability checking
- extract_text_from_image/3 for OCR tasks
- analyze_images/4 for image analysis
- Integration features
- Automatic provider formatting in chat/3
- Vision content detection
- Detail level configuration
- Documentation and examples
- Comprehensive example (vision_example.exs)
- Multiple use cases demonstrated
- Error handling examples
Priority 0 - Immediate (Next 2 weeks)
- Implement comprehensive Gemini API support (TDD approach)
- Code refactoring for shared behaviors (reduce duplication by ~40%)
- Debug logging levels
- Complete any remaining core implementations
Priority 1 - Short Term (Next month)
- High-demand provider adapters (Mistral AI, Together AI, Cohere, Perplexity)
Priority 2 - Medium Term (Next quarter)
- Advanced router with cost-based routing and fallbacks
- Batch processing API
- Extensible callback system
Priority 3+ - Long Term
- Additional providers based on demand
- Advanced features and optimizations
- Create comprehensive example_app that demonstrates all library features
- Migrate existing examples into the unified app:
- Advanced features (retries, context management, etc.)
- Caching functionality
- Embeddings
- Function calling
- Local model usage
- Model capabilities exploration
- Structured outputs with Instructor
- Testing with mock adapter
- Vision/multimodal features
- Add configuration system for provider selection
- Use Ollama with Qwen3 8B (IQ4_XS) as default (fast local model)
- Create interactive CLI menu for feature selection
- Add comprehensive error handling and user feedback
- Document setup and usage instructions
- Remove deprecated individual example files
- Create
test/singularity_llm/gemini/models_test.exs- Test listing available models
- Test getting model details
- Test model capabilities and limits
- Test error handling for invalid models
- Implement
lib/singularity_llm/gemini/models.ex-
list_models/1- List available models -
get_model/2- Get specific model details - Model struct with all properties
- Error handling
-
- Run tests and ensure they pass
- Update model registry with Gemini models
- Create
test/singularity_llm/gemini/content_test.exs- Test basic text generation
- Test streaming responses
- Test with system instructions
- Test with generation config (temperature, top_p, etc.)
- Test with safety settings
- Test multimodal inputs (text + images)
- Test structured output (JSON mode)
- Test function calling
- Test error scenarios
- Implement
lib/singularity_llm/gemini/content.ex-
generate_content/3- Non-streaming generation -
stream_generate_content/3- Streaming generation - Request/response structs
- Generation config handling
- Safety settings
- Tool/function definitions
-
- Integration with main SingularityLLM adapter pattern
- Run tests and ensure they pass (validation tests pass, API tests require valid key)
- Create
test/singularity_llm/adapters/gemini/tokens_test.exs- Test counting tokens for text
- Test counting tokens for multimodal content
- Test with different models
- Test error handling
- Implement
lib/singularity_llm/gemini/tokens.ex-
count_tokens/3- Count tokens for content - Token count response struct
- Integration with content generation
-
- Run tests and ensure they pass
- Create
test/singularity_llm/adapters/gemini/files_test.exs- Test file upload
- Test file listing
- Test file deletion
- Test file metadata retrieval
- Test file state transitions
- Test error handling
- Implement
lib/singularity_llm/gemini/files.ex-
upload_file/3- Upload media files (resumable upload) -
list_files/1- List uploaded files -
get_file/2- Get file metadata -
delete_file/2- Delete a file -
wait_for_file/3- Wait for file processing - File struct and state management
-
- Run tests and ensure they pass
- Create
test/singularity_llm/gemini/caching_test.exs- Test creating cached content
- Test listing cached content
- Test updating cached content
- Test deleting cached content
- Test using cached content in generation
- Test TTL and expiration
- Implement
lib/singularity_llm/gemini/caching.ex-
create_cached_content/2- Create cache entry -
list_cached_contents/1- List cache entries -
get_cached_content/2- Get cache details -
update_cached_content/3- Update cache -
delete_cached_content/2- Delete cache - Integration with content generation
-
- Run tests and ensure they pass
- Create
test/singularity_llm/gemini/embeddings_test.exs- Test text embeddings
- Test batch embeddings
- Test different embedding models
- Test content types (query vs document)
- Test error handling
- Implement
lib/singularity_llm/gemini/embeddings.ex-
embed_content/3- Generate embeddings - Batch embedding support
- Task type configuration
- Integration with main embeddings interface
-
- Run tests and ensure they pass
- Add WebSocket client library (Gun)
- Create
test/singularity_llm/adapters/gemini/live_test.exs- Test WebSocket connection (URL building and headers)
- Test message building (setup, client content, realtime input, tool response)
- Test message parsing (server content, tool calls, transcription, go away)
- Test validation (setup config, realtime input, generation config)
- Test struct definitions (all message types)
- Implement
lib/singularity_llm/gemini/live.ex- WebSocket client implementation using Gun
- GenServer-based session management
- Audio/video/text streaming support
- Event handling and message parsing
- Tool execution interface
- Connection lifecycle management
- Comprehensive validation and error handling
- Run tests and ensure they pass (23 tests, 100% pass rate)
- Create
test/singularity_llm/gemini/tuning_test.exs- Test creating tuned models
- Test listing tuned models
- Test monitoring tuning jobs
- Test using tuned models
- Test hyperparameter configuration
- Implement
lib/singularity_llm/gemini/tuning.ex-
create_tuned_model/2- Start tuning job -
list_tuned_models/1- List tuned models -
get_tuned_model/2- Get tuning details -
delete_tuned_model/2- Delete tuned model -
generate_content/3- Generate using tuned model -
stream_generate_content/3- Stream using tuned model - All struct definitions (TunedModel, TuningTask, etc.)
-
- Run tests and ensure they pass (unit tests pass, integration tests require valid API key)
- Create
test/singularity_llm/gemini/permissions_test.exs- Test creating permissions
- Test listing permissions
- Test updating permissions
- Test deleting permissions
- Test transfer ownership
- Implement
lib/singularity_llm/gemini/permissions.ex- Permission CRUD operations
- Role management (READER, WRITER, OWNER)
- Grantee types (USER, GROUP, EVERYONE)
- Transfer ownership operation
- Run tests and ensure they pass (unit tests pass, integration tests require OAuth2)
- Important Note: Permissions API requires OAuth2 authentication, not API keys!
- Create
test/singularity_llm/gemini/qa_test.exs- Test query API with inline passages
- Test query API with semantic retriever
- Test answer generation with different styles
- Test with different answer styles (abstractive, extractive, verbose)
- Test temperature control and safety settings
- Test response parsing and error handling
- Implement
lib/singularity_llm/gemini/qa.ex-
generate_answer/4- Semantic search and QA - Answer generation config with temperature and safety
- Grounding with inline passages and semantic retriever
- Input validation and structured response parsing
- Support for both API key and OAuth2 authentication
-
- Run tests and ensure they pass (unit tests pass, integration tests require valid API key/corpus)
- Create
test/singularity_llm/gemini/corpus_test.exs- Test creating corpora with auto-generated and custom names
- Test listing corpora with pagination support
- Test updating corpora (display name changes)
- Test deleting corpora with force option
- Test querying corpora with metadata filters
- Test input validation and error handling
- Test response parsing for all operations
- Implement
lib/singularity_llm/gemini/corpus.ex- Complete CRUD operations (create, list, get, update, delete)
- Semantic search with query_corpus function
- Metadata filter system with conditions and operators
- Input validation for all parameters
- OAuth2 authentication support (required for corpus operations)
- Pagination support for listing
- Structured response parsing
- Run tests and ensure they pass (unit tests pass, integration tests require OAuth2 token)
- Create
test/singularity_llm/adapters/gemini/document_test.exs- Test creating documents with metadata
- Test listing documents with pagination
- Test updating documents with field masks
- Test deleting documents with force option
- Test querying documents with semantic search
- Test custom metadata handling (string, numeric, string list)
- Test validation and error handling
- Implement
lib/singularity_llm/gemini/document.ex- Complete CRUD operations (create, list, get, update, delete)
- Semantic search with query_document function
- Custom metadata system with all value types
- Input validation for all parameters
- Authentication support (API key and OAuth2)
- Pagination support for listing
- Comprehensive struct definitions
- Run tests and ensure they pass (20 unit tests, 100% pass rate)
- Create
test/singularity_llm/adapters/gemini/chunk_test.exs- Test creating chunks with data and metadata
- Test listing chunks with pagination
- Test updating chunks with field masks
- Test deleting chunks
- Test batch operations (create, update, delete)
- Test validation and error handling for all operations
- Test struct definitions and parsing
- Implement
lib/singularity_llm/gemini/chunk.ex- Complete CRUD operations (create, list, get, update, delete)
- Batch operations (batch_create, batch_update, batch_delete)
- Custom metadata system with all value types
- Input validation for all parameters
- Authentication support (API key and OAuth2)
- Pagination support for listing
- Comprehensive struct definitions (Chunk, ChunkData, CustomMetadata, etc.)
- Run tests and ensure they pass (22 unit tests, 100% pass rate)
- Create
test/singularity_llm/adapters/gemini/retrieval_permissions_test.exs- Test corpus permissions (create, list, get, update, delete)
- Test permission validation for corpus operations
- Test role hierarchy (READER, WRITER, OWNER)
- Test grantee types (USER, GROUP, EVERYONE)
- Test authentication methods (API key and OAuth2)
- Test struct definitions and JSON parsing
- Extend existing
lib/singularity_llm/gemini/permissions.ex- Corpus permissions already supported (corpora/{corpus} parent format)
- Complete CRUD operations for corpus permissions
- Input validation and error handling
- Support for all grantee types and roles
- Run tests and ensure they pass (15 unit tests, 9 passing non-integration tests)
- Create
test/singularity_llm/adapters/gemini/integration_test.exs- Test end-to-end workflows with adapter
- Test cross-feature interactions and API modules
- Test error propagation and handling
- Test performance characteristics (marked with @tag :performance)
- Enhance
lib/singularity_llm/adapters/gemini.ex- Main adapter implementation (chat, streaming, embeddings)
- Integration with SingularityLLM interfaces (unified API)
- Feature detection and capabilities via ModelCapabilities
- Error handling and configuration validation
- SingularityLLM module already supports Gemini provider
- All individual API modules tested and working
- Session persistence (save_to_file/load_from_file in SingularityLLM.Session)
- Function calling argument parsing (parse_arguments in SingularityLLM.FunctionCalling)
- Model info retrieval (get_model_info in SingularityLLM.ModelCapabilities)
- Provider capability tracking system
- Create SingularityLLM.ProviderCapabilities module
- Track provider-level features:
- Available endpoints (chat, embeddings, images, audio, etc.)
- Authentication methods (api_key, oauth, aws_signature, etc.)
- Streaming support at provider level
- Cost tracking availability
- Dynamic model listing support
- Batch operations support
- File upload capabilities
- Rate limiting information
- Provider metadata (description, docs, status URLs)
- Provider capability discovery API
- Integration with ModelCapabilities
- Capability versioning for API versions (future enhancement)
- Context statistics implementation
- Implement
context_stats/1function in SingularityLLM module - Calculate token distribution across messages
- Provide truncation impact analysis
- Return statistics about context usage
- Implement
- Token usage extraction for local models
- Extract token usage from Bumblebee/Local adapter responses
- Add token counting support to Local adapter
- Integrate with existing usage tracking
- Cost filtering for model recommendations
- Implement cost-based filtering in ModelCapabilities.recommend_models/1
- Add max_cost option to recommendation queries
- Filter models based on pricing data when available
-
/api/blobs/:digestendpoints for blob management- GET /api/blobs/:digest - Check if a blob exists
- HEAD /api/blobs/:digest - Check blob existence (headers only)
- POST /api/blobs/:digest - Create a blob
- Used internally by Ollama for model layer management
- Parse and expose created_at timestamps in responses
- Add metadata field to LLMResponse and StreamChunk types
- Include timing information (total_duration, load_duration, etc.)
- Preserve model context for stateful conversations
- Extract streaming into StreamingCoordinator module
- Standardize Task/Stream.resource pattern
- Common SSE parsing and buffering
- Provider-agnostic chunk handling
- Error recovery integration
- Create RequestBuilder shared module
- Common request body construction
- Optional parameter handling
- Provider-specific extensions
- Implement ModelFetcher behavior
- Standardize model API fetching
- Common parse/filter/transform pipeline
- Integration with ModelLoader
- Extract VisionFormatter module
- Provider-specific image formatting
- Content type detection
- Base64 encoding utilities
- Enhance existing shared modules
- Extend ResponseBuilder for more formats
- Add provider-specific headers to HTTPClient
- Unify error response parsing
- OpenAI-Compatible base adapter for shared implementation
- Provider detection pattern (provider/model-name syntax)
- Advanced router with strategies
- Cost-based routing
- Automatic fallback chains
- Model group aliases
- Least-latency routing
- Usage-based routing
- Batch processing API
- Health checks and circuit breakers
- Groq adapter (fast inference)
- XAI adapter (Grok models)
- Mistral AI adapter (European models)
- Together AI adapter (cost-effective)
- Cohere adapter (enterprise, rerank API)
- Perplexity adapter (search-augmented)
- Replicate adapter (marketplace)
- Databricks adapter
- Vertex AI adapter (Google Cloud)
- Azure AI adapter (beyond OpenAI)
- Fireworks AI adapter
- DeepInfra adapter
- Watsonx adapter (IBM)
- Sagemaker adapter (AWS)
- Anyscale adapter
- vLLM adapter
- Hugging Face Inference API adapter
- Baseten adapter
- DeepSeek adapter
- Anthropic cache control headers
- Vertex AI context caching
- Bedrock Converse API support
- Provider-specific error mapping
- Provider capability detection
- Extensible callback system
- Telemetry integration for metrics
- Custom metrics collection
- Request/response logging with redaction
-
Modern Request Parameters
-
max_completion_tokens(replaces deprecatedmax_tokens) -
nparameter for multiple completions (1-128) -
top_pnucleus sampling parameter -
frequency_penaltyandpresence_penalty(-2 to 2) -
seedparameter for deterministic sampling -
stopsequences (string or array) -
service_tierfor rate limiting control
-
-
Response Format & Structured Outputs
- JSON mode:
response_format: {"type": "json_object"} - JSON Schema structured outputs with validation
- Refusal handling in responses
-
logprobstoken probabilities in responses
- JSON mode:
-
Modern Tool/Function Calling
- Migrate from deprecated
functionsto moderntoolsAPI -
tool_choiceparameter for controlling tool usage - Parallel tool calls support
- Tool calling in streaming responses
- Migrate from deprecated
-
Advanced Message Content
- Multiple content parts per message (text + images + audio)
- File content references
- Audio content in messages
-
New Model Features
- Audio output with voice selection
- Web search integration with
web_search_options - Reasoning effort control for o1/o3 models
- Developer role for o1+ models (replaces system for these models)
- Predicted outputs for faster regeneration
-
Enhanced Usage Tracking
- Cached tokens, reasoning tokens, audio tokens in usage
- More detailed cost breakdown
- Assistants API (Beta)
- Create/list/modify assistants
- Thread management
- Run management with tool integration
- Files API
- File upload for assistants and fine-tuning
- File management and retrieval
- Image Generation (DALL-E)
- Text-to-image generation
- Image variations and edits
- Audio API
- Speech-to-text transcription
- Text-to-speech generation
- Audio translation
- Moderation API
- Content safety classification
- Multi-category moderation scores
- Batch API
- Async batch processing
- Cost-effective bulk operations
- Fine-tuning API
- Custom model training
- Job management and monitoring
- Files API for uploads
- Fine-tuning management API
- Assistants API
- Rerank API
- Audio transcription API
- Text-to-speech API
- Image generation API
- Moderation API
- Guardrails system
- PII masking
- Prompt injection detection
- Content moderation
- Secret detection
- Custom guardrail plugins
- Request sanitization
- Response validation
- Debug logging levels
- Enhanced mock system with patterns
- Provider comparison tools
- Migration guides from other libraries
- Fine-tuning management
- Semantic chunking for better truncation
- Context compression techniques
- Dynamic context window adjustment
- Token budget allocation strategies
- Usage analytics and reporting
- Cost optimization recommendations
- Budget alerts and limits
- Provider cost comparison
- Token usage predictions
- Mock adapters for testing
- Integration test suite for each provider
- Performance benchmarks
- Load testing for concurrent requests
- Property-based tests for context management
- Comprehensive adapter implementation guide
- Provider-specific configuration examples
- Migration guide from other LLM libraries
- Best practices for context management
- Cost optimization strategies
The automatic test response caching system has been successfully implemented with all core features:
- ✅ Timestamp-based caching - No version conflicts, natural chronological ordering
- ✅ Automatic interception - Zero configuration required for integration tests
- ✅ Smart cache selection - Multiple fallback strategies (latest_success, latest_any, best_match)
- ✅ TTL management - Configurable expiration with per-test-type overrides
- ✅ Content deduplication - Symlinks for identical responses save disk space
- ✅ Comprehensive monitoring - Hit rates, cost savings, performance metrics
- ✅ Test helpers - Easy cache management functions for tests
- ✅ Mix tasks - Command-line tools for cache operations
- ✅ Full documentation - Usage guide, configuration, best practices
- ✅ Cache metadata tracking - Responses include
from_cachemetadata flag
# Automatic - just tag your tests!
@moduletag :integration # That's it! Caching is automatic
# Check statistics
mix singularity_llm.cache.stats
# Clear cache
mix singularity_llm.cache.clearThis document outlines the implementation plan for automatic test response caching in SingularityLLM. The goal is to automatically save every real API response during integration tests for replay in future test runs, reducing API costs and improving test reliability.
SingularityLLM already has a sophisticated caching system with the following components:
- SingularityLLM.Cache - Runtime ETS-based caching with optional disk persistence
- SingularityLLM.ResponseCache - Disk-based response collection for Mock adapter
- SingularityLLM.CachingInterceptor - Higher-level response collection for testing
- Mock Adapter Integration - Ability to replay cached responses
- Manual activation required (environment variables/config)
- No automatic test environment detection
- Limited integration test scenario organization
- No cache versioning for API response format changes
- No selective caching for specific test patterns
- File:
lib/singularity_llm/test_cache_config.ex - Purpose: Centralized configuration for test response caching
- Features:
- Automatic detection of test environment (
Mix.env() == :test) - Integration test detection (
:integrationtag presence) - OAuth2 test detection (
:oauth2tag presence) - Configuration hierarchy: environment variables > test config > defaults
- Automatic detection of test environment (
- Configuration Options:
config :singularity_llm, :test_cache, enabled: true, # Enable automatic test caching auto_detect: true, # Auto-enable in test environment cache_dir: "test/cache", # Test cache directory organization: :by_provider, # :by_provider, :by_test_module, :by_tag cache_integration_tests: true, # Cache integration test responses cache_oauth2_tests: true, # Cache OAuth2 test responses replay_by_default: true, # Use cached responses by default save_on_miss: true, # Save new responses when cache miss ttl: :timer.days(7), # Cache TTL (7 days default, :infinity to never expire) # Timestamp-based caching timestamp_format: :iso8601, # Filename timestamp format fallback_strategy: :latest_success, # :latest_success, :latest_any, :best_match # Retention policy max_entries_per_cache: 10, # Keep max 10 timestamped entries per cache key cleanup_older_than: :timer.days(30), # Delete entries older than 30 days compress_older_than: :timer.days(7), # Compress entries older than 7 days # Content optimization deduplicate_content: true, # Use symlinks for identical content content_hash_algorithm: :sha256 # Hash algorithm for deduplication
- File:
lib/singularity_llm/test_cache_detector.ex - Purpose: Intelligent detection of test scenarios requiring caching
- Features:
- Detect integration tests by examining ExUnit tags
- Detect OAuth2 tests by examining ExUnit tags and test module names
- Runtime detection of live API usage vs mocked responses
- Process-level state tracking for test caching mode
- Functions:
def integration_test_running?() :: boolean() def oauth2_test_running?() :: boolean() def should_cache_responses?() :: boolean() def get_current_test_context() :: %{module: atom(), tags: [atom()], name: string()}
- File:
lib/singularity_llm/cache/storage/test_cache.ex - Purpose: Specialized storage backend for timestamp-based test response caching
- Features:
- Hierarchical organization by provider/test module/scenario
- Timestamp-based file naming for natural chronological ordering
- Rich metadata index with content deduplication
- Fuzzy matching for similar requests across timestamps
- TTL-based cache expiration and cleanup
- Smart fallback strategies (latest success, latest any, best match)
- Storage Structure:
test/cache/ ├── integration/ # Integration tests │ ├── anthropic/ │ │ ├── chat_basic/ │ │ │ ├── 2024-01-15T10-30-45Z.json # Timestamped responses │ │ │ ├── 2024-01-20T14-22-10Z.json │ │ │ ├── 2024-01-22T09-15-33Z.json │ │ │ └── index.json # Cache index and metadata │ │ └── chat_streaming/ │ ├── openai/ │ └── gemini/ └── oauth2/ # OAuth2 tests ├── gemini/ │ ├── corpus_crud/ │ │ ├── 2024-01-18T16-45-12Z.json │ │ ├── 2024-01-21T11-30-25Z.json │ │ └── index.json │ └── document_operations/
- File:
lib/singularity_llm/test_cache_ttl.ex - Purpose: Handle cache selection and TTL logic for timestamp-based caching
- Features:
- Check cache age against configurable TTL across all timestamps
- Smart selection of best cache entry based on fallback strategy
- Configurable TTL per test type (integration vs OAuth2)
- Force refresh options for specific test scenarios
- Functions:
def select_cache_entry(cache_dir, ttl, strategy) :: {:ok, timestamp} | {:expired, latest} | :none def cache_expired?(timestamp, ttl) :: boolean() def get_latest_valid_entry(cache_dir, ttl) :: {:ok, timestamp} | :none def get_latest_successful_entry(cache_dir, ttl) :: {:ok, timestamp} | :none def force_refresh_for_test?(test_context) :: boolean() def calculate_ttl(test_tags, provider) :: non_neg_integer() | :infinity
- File:
lib/singularity_llm/test_cache_timestamp.ex - Purpose: Manage timestamped cache entries and cleanup policies
- Features:
- Generate consistent timestamp-based filenames
- List and sort available timestamps for cache keys
- Implement retention policies (max entries, max age)
- Content deduplication using file hashes
- Automatic cleanup of old timestamps
- Functions:
def generate_timestamp_filename() :: String.t() def parse_timestamp_from_filename(filename) :: {:ok, DateTime.t()} | :error def list_cache_timestamps(cache_dir) :: [DateTime.t()] def cleanup_old_entries(cache_dir, max_entries, max_age) :: cleanup_report() def deduplicate_content(cache_dir) :: dedup_report() def get_content_hash(file_path) :: String.t()
- File:
lib/singularity_llm/test_cache_index.ex - Purpose: Maintain index of timestamped cache entries with metadata
- Index Structure:
%CacheIndex{ # Cache key identification cache_key: "anthropic/chat_basic", test_context: %{module: "AnthropicIntegrationTest", tags: [:integration]}, # TTL configuration ttl: :timer.days(7), fallback_strategy: :latest_success, # Timestamp entries (sorted newest first) entries: [ %{ timestamp: ~U[2024-01-22 09:15:33Z], filename: "2024-01-22T09-15-33Z.json", status: :success, # :success, :error, :timeout size: 1024, content_hash: "abc123def", # For deduplication response_time_ms: 1250, api_version: "2023-06-01", cost: %{input: 0.001, output: 0.003, total: 0.004} }, %{ timestamp: ~U[2024-01-20 14:22:10Z], filename: "2024-01-20T14-22-10Z.json", status: :success, size: 998, content_hash: "abc123def", # Same hash = duplicate content response_time_ms: 980, api_version: "2023-06-01", cost: %{input: 0.001, output: 0.002, total: 0.003} } ], # Usage statistics total_requests: 45, cache_hits: 43, last_accessed: ~U[2024-01-22 12:00:00Z], access_count: 45, # Cleanup tracking last_cleanup: ~U[2024-01-20 00:00:00Z], cleanup_before: ~U[2024-01-01 00:00:00Z] # Delete entries before this date }
- File:
lib/singularity_llm/test_response_interceptor.ex - Purpose: Automatically intercept and cache responses during tests
- Features:
- Hook into HTTPClient request/response cycle
- Automatic cache key generation based on test context
- Rich metadata capture (timing, test info, provider details)
- Streaming response reassembly and caching
- Integration Points:
SingularityLLM.Adapters.Shared.HTTPClientSingularityLLM.Cache.with_cache/3- ExUnit test lifecycle hooks
- File:
lib/singularity_llm/adapters/shared/http_client.ex - Purpose: Add timestamp-based test caching support to HTTP client
- Changes:
- Add test cache check before making real HTTP requests
- Select best cache entry based on TTL and fallback strategy
- Save new responses with timestamp-based filenames
- Capture and save responses when test caching is enabled
- Maintain original error handling and retry logic
- Support for both streaming and non-streaming responses
- Fallback to older timestamps when fresh requests fail
- New Functions:
defp maybe_use_test_cache(url, body, headers, opts) defp select_best_cache_entry(cache_dir, ttl, strategy) defp save_timestamped_response(request_data, response_data, metadata) defp build_test_cache_key(url, body, test_context) defp fallback_to_older_timestamp(cache_dir, error) defp update_cache_index(cache_dir, new_entry)
- File:
lib/singularity_llm/test_response_metadata.ex - Purpose: Capture comprehensive metadata for cached responses
- Metadata Fields:
%ResponseMetadata{ # Request Information provider: "anthropic", endpoint: "/v1/messages", method: "POST", request_body: %{...}, request_headers: [...], # Response Information response_body: %{...}, response_headers: [...], status_code: 200, response_time_ms: 1245, # Test Context test_module: "SingularityLLM.AnthropicIntegrationTest", test_name: "basic chat completion", test_tags: [:integration, :anthropic], test_pid: "#PID<0.123.45>", # Caching Information cached_at: ~U[2024-01-01 00:00:00Z], cache_version: "1.0", api_version: "2023-06-01", # Usage Tracking usage: %{input_tokens: 10, output_tokens: 25, total_tokens: 35}, cost: %{input: 0.0001, output: 0.0005, total: 0.0006} }
- File:
lib/singularity_llm/test_cache_matcher.ex - Purpose: Intelligent matching of requests to cached responses
- Features:
- Exact match for identical requests
- Fuzzy matching for similar requests (configurable tolerance)
- Content-based matching for different formatting
- Test context-aware matching
- Matching Strategies:
def exact_match(request, cached_requests) def fuzzy_match(request, cached_requests, tolerance \\ 0.9) def semantic_match(request, cached_requests) def context_match(request, cached_requests, test_context)
- File:
lib/singularity_llm/test_cache_strategy.ex - Purpose: Implement cache-first strategy for test requests with timestamp selection
- Strategy Flow:
- Check if test caching is enabled
- Generate cache key from request and test context
- Load cache index for the cache key
- Select best timestamp entry based on strategy:
:latest_success: Most recent successful response within TTL:latest_any: Most recent response (success or error) within TTL:best_match: Best matching response considering content similarity
- If valid timestamp found: return cached response
- If no valid cache or expired: make real request and save with new timestamp
- If real request fails: fallback to older timestamps if available
- Fallback Handling:
- Graceful degradation when cache is corrupted
- Fallback to older timestamps when refresh fails
- Configurable cache miss behavior (fail vs. make real request)
- Cache warming during test setup
- Automatic cleanup based on age and count limits
- File:
lib/singularity_llm/test_cache_stats.ex - Purpose: Track cache performance and cost savings with timestamp-based metrics
- Features:
- Cache hit/miss ratios per test suite
- TTL-based refresh statistics
- Timestamp fallback usage tracking
- Cost savings calculations
- Response time comparisons (cached vs. real)
- Test suite completion time improvements
- Storage overhead monitoring with deduplication stats
- Reporting:
def print_cache_summary() # Output: # Test Cache Summary: # ================== # Total Requests: 150 # Cache Hits: 130 (86.7%) # Cache Misses: 8 (5.3%) # TTL Refreshes: 12 (8.0%) # Fallback to Older Timestamp: 2 (1.3%) # Cost Savings: $2.45 # Time Savings: 45.2 seconds # Storage Used: 15.3 MB (unique: 8.1 MB, duplicates: 7.2 MB) # Deduplication Ratio: 47% space saved # Total Timestamps: 234 # Oldest Cache Entry: 3 days ago # Average Cache Age: 1.2 days
- File:
test/support/test_helpers.ex - Purpose: Add test cache helpers and utilities with timestamp-based operations
- New Functions:
def with_test_cache(opts \\ [], func) def clear_test_cache(scope \\ :all) def warm_test_cache(test_module) def verify_cache_integrity() def force_cache_miss(pattern) def force_cache_refresh(pattern) def set_test_ttl(test_pattern, ttl) def list_cache_timestamps(cache_pattern) def restore_cache_timestamp(cache_pattern, timestamp) def cleanup_old_timestamps(max_age \\ :timer.days(30)) def deduplicate_cache_content(cache_pattern \\ :all) def get_cache_stats(test_module \\ :all) def set_fallback_strategy(test_pattern, strategy)
- Files: All integration test files
- Purpose: Add automatic test caching to integration tests
- Changes:
- Add setup hooks for test cache initialization
- Configure cache warming for known test scenarios
- Add cache verification in test teardown
- Implement cache-aware test ordering
- Files:
test/singularity_llm/adapters/gemini/*oauth2*_test.exs - Purpose: Special handling for OAuth2 test caching
- Features:
- OAuth2 token anonymization in cache
- Request signature generation excluding sensitive data
- Automatic cache invalidation on token refresh
- Special handling for time-sensitive operations
- File:
lib/singularity_llm/test_cache_scheduler.ex - Purpose: Background process for managing cache TTL and timestamp cleanup
- Features:
- Periodic scanning for expired cache entries
- Proactive refresh of critical cache entries before expiration
- Automatic timestamp cleanup based on age and count limits
- Content deduplication across timestamps
- Configurable cleanup strategies (eager, lazy, manual)
- Functions:
def start_scheduler(opts \\ []) :: {:ok, pid()} | {:error, reason} def schedule_refresh(cache_pattern, delay) :: :ok def run_cleanup_cycle() :: cleanup_report() def run_deduplication_cycle() :: dedup_report() def refresh_critical_caches() :: refresh_report()
- File:
lib/singularity_llm/test_cache_api_versioning.ex - Purpose: Handle API version changes and timestamp-based fallback strategies
- Features:
- API version detection and compatibility checking
- Timestamp-based fallback when API versions differ
- Automatic cache refresh when breaking API changes detected
- Smart selection of compatible timestamps
- API evolution tracking across timestamps
- File:
lib/singularity_llm/test_cache_optimizer.ex - Purpose: Optimize cache storage and performance
- Features:
- Response compression for large payloads
- Cache deduplication for identical responses
- Periodic cache cleanup and optimization
- Cache size monitoring and management
- File:
lib/mix/tasks/singularity_llm.cache.ex - Purpose: Command-line tools for cache management with TTL and timestamps
- Commands:
# Basic cache management mix singularity_llm.cache.clear # Clear all test cache mix singularity_llm.cache.stats # Show cache statistics with TTL info mix singularity_llm.cache.verify # Verify cache integrity mix singularity_llm.cache.warm --suite oauth2 # Warm cache for test suite # TTL and refresh management mix singularity_llm.cache.refresh --expired # Refresh all expired cache entries mix singularity_llm.cache.refresh --pattern "openai/*" # Refresh specific pattern mix singularity_llm.cache.set-ttl --pattern "oauth2/*" --ttl "1d" # Set TTL for pattern mix singularity_llm.cache.check-expiry # Show cache entries near expiration # Timestamp management mix singularity_llm.cache.timestamps --list --pattern "anthropic/*" # List timestamps for pattern mix singularity_llm.cache.timestamps --cleanup # Clean up old timestamps mix singularity_llm.cache.timestamps --restore "2024-01-15T10:30:45Z" # Restore specific timestamp mix singularity_llm.cache.deduplicate # Remove duplicate content across timestamps # Import/Export with timestamps mix singularity_llm.cache.export --format json --include-timestamps # Export with all timestamps mix singularity_llm.cache.import --file cache.json --preserve-timestamps # Import preserving timestamps mix singularity_llm.cache.compress --older-than "7d" # Compress old timestamps
- File:
docs/test_caching.md - Content:
- How automatic test caching works
- Configuration options and best practices
- Troubleshooting common issues
- Cost savings and performance benefits
- Integration with CI/CD pipelines
- Files:
test/singularity_llm/test_cache_*_test.exs - Purpose: Comprehensive testing of caching functionality
- Test Categories:
- Unit tests for cache components
- Integration tests for end-to-end caching
- Performance tests for cache overhead
- Edge case handling tests
- Files:
README.md,config/config.exs - Purpose: Document new test caching configuration options
- Content:
- Environment variable documentation
- Configuration examples for different scenarios
- Migration guide from manual to automatic caching
- Complete Tasks 1.1-1.6
- Basic test environment detection and configuration
- TTL system and backup/versioning infrastructure
- Complete Tasks 2.1-2.3
- Core response interception and storage functionality
- TTL-aware cache checking and refresh logic
- Complete Tasks 3.1-3.3
- Intelligent cache matching and replay system with TTL support
- Version fallback mechanisms
- Complete Tasks 4.1-4.3
- Full integration with existing test suites
- TTL and versioning helper functions
- Complete Tasks 5.1-5.4
- Automated refresh scheduling and version cleanup
- Enhanced CLI tools for cache management
- Complete Tasks 6.1-6.3
- Comprehensive documentation and test coverage
- Performance optimization and final polish
- Integration tests automatically cache responses by default
- OAuth2 tests work seamlessly with cached responses
- Cost reduction of >90% for repeated test runs
- Zero configuration required for basic usage
- Backward compatibility with existing test infrastructure
- Cache hit ratio >95% for repeated test runs
- Test suite runtime improvement >50% with cache
- Cache storage overhead <100MB for full test suite
- Cache lookup time <10ms per request
- All existing tests pass with caching enabled
- Cache integrity verified with checksum validation
- Graceful fallback when cache is unavailable
- Clear error messages for cache-related issues
- Cache corruption: Implement checksum validation and automatic cache repair
- Test flakiness: Ensure cached responses maintain original timing and error patterns
- Storage requirements: Implement compression and cleanup strategies
- Integration complexity: Maintain clear separation between caching and core functionality
- Breaking changes: Comprehensive test coverage and gradual rollout
- Performance regression: Benchmark cache overhead and optimize hot paths
- Maintenance burden: Clear documentation and automated cache management
# config/test.exs
config :singularity_llm, :test_cache,
enabled: true,
auto_detect: true,
cache_dir: "test/cache",
replay_by_default: true,
save_on_miss: true,
ttl: :timer.days(7), # Refresh cache weekly
fallback_strategy: :latest_success,
max_entries_per_cache: 5,
deduplicate_content: true# .github/workflows/test.yml
env:
EX_LLM_TEST_CACHE_ENABLED: "true"
EX_LLM_TEST_CACHE_DIR: "/tmp/ex_llm_cache"
EX_LLM_TEST_CACHE_REPLAY_ONLY: "true" # Don't make real requests in CI
EX_LLM_TEST_CACHE_TTL: "0" # Use any cached response in CI
EX_LLM_TEST_CACHE_FALLBACK_STRATEGY: "latest_any" # Use any timestamp if needed# Force cache miss for specific tests
export EX_LLM_TEST_CACHE_FORCE_MISS="AnthropicIntegrationTest"
# Force cache refresh for specific tests (ignores TTL)
export EX_LLM_TEST_CACHE_FORCE_REFRESH="OAuth2Test"
# Set custom TTL for development
export EX_LLM_TEST_CACHE_TTL="3600" # 1 hour TTL
# Set fallback strategy
export EX_LLM_TEST_CACHE_FALLBACK_STRATEGY="latest_success" # or latest_any, best_match
# Disable caching for debugging
export EX_LLM_TEST_CACHE_ENABLED="false"
# Use specific timestamp for testing
export EX_LLM_TEST_CACHE_USE_TIMESTAMP="2024-01-15T10:30:45Z"
# Control cleanup behavior
export EX_LLM_TEST_CACHE_MAX_ENTRIES="10"
export EX_LLM_TEST_CACHE_CLEANUP_OLDER_THAN="30d"This comprehensive plan builds upon SingularityLLM's existing caching infrastructure to provide seamless, automatic test response caching that will significantly reduce API costs and improve test reliability.
- The library aims to be the go-to solution for LLM integration in Elixir
- Focus remains on being a unified, reliable LLM client library
- All features should work consistently across providers where possible
- Provider-specific features should be clearly documented
- Performance and cost efficiency are key priorities
- Features that belong at the application layer have been moved to docs/DROPPED.md