Changelog¶
All notable changes to this project are documented here. For detailed information, see CHANGELOG.md on GitHub.
[0.2.6]¶
New Features¶
max_llm_calls — limit budgets by call count
budget(max_llm_calls=50)raisesBudgetExceededErrorafter 50 LLM API calls- Can be combined with
max_usd:budget(max_usd=1.00, max_llm_calls=20) - Works with fallback:
budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"})
LiteLLM provider adapter
- Install with
pip install shekel[litellm] - Patches
litellm.completionandlitellm.acompletion(sync + async, including streaming) - Enforces budgets and circuit-breaks across all 100+ providers LiteLLM supports (Gemini, Cohere, Ollama, Azure, Bedrock, Mistral, and more)
- Model names with provider prefix (e.g.
gemini/gemini-1.5-flash) pass through to the pricing engine
[0.2.5] - 2026-03-11¶
🔧 Extensible Provider Architecture¶
Shekel now has a pluggable architecture for adding new LLM provider support without modifying core code.
ProviderAdapter — Standard interface for any LLM provider - 8 abstract methods: patching, token extraction, streaming, validation - All providers (OpenAI, Anthropic, custom) implement this interface - Clear contract for what shekel needs from a provider
ProviderRegistry — Central hub for provider management - Thread-safe registration and lifecycle management - Automatic patch installation and removal - Provider discovery by name for fallback validation - Decoupled from core code — no core changes needed for new providers
Add your own provider in 3 steps:
1. Implement ProviderAdapter interface
2. Register with ADAPTER_REGISTRY.register(YourAdapter())
3. Works everywhere automatically
Community can now add: Cohere, Replicate, vLLM, Mistral, Bedrock, Vertex AI, and others
✅ Validated with Real Integration Tests¶
The architecture is battle-tested with comprehensive end-to-end validation:
Groq API — 25+ integration tests - Custom pricing and budget enforcement - Nested budgets and cost attribution - Streaming responses and concurrent calls - Rate limiting and error handling - Real API keys in CI
Google Gemini API — 30+ integration tests - Multi-turn conversations and streaming - JSON mode and function calling - Token counting accuracy - Budget enforcement and fallback - Real API keys in CI
These test suites serve as reference implementations showing how to build a provider adapter.
⚙️ Production-Grade Reliability¶
- Exponential backoff retry logic — Gracefully handles rate limiting and transient failures
- 100+ integration test scenarios — Comprehensive validation of architecture under load
- Concurrent test stability — Reduced flakiness when multiple providers are tested simultaneously
- CI improvements — Integration and performance tests run in parallel
✅ Quality Improvements¶
- 100+ integration test scenarios — Comprehensive real API coverage
- mkdocs integrity checks — Prevent broken documentation links in CI
- Better provider abstraction — Easier to add new LLM provider support in the future
[0.2.4] - 2026-03-11¶
✨ Langfuse Integration (New!)¶
Full LLM observability with zero configuration. See Langfuse Integration Guide for complete documentation.
Feature #1: Real-Time Cost Streaming¶
- Automatic metadata updates after each LLM call
- Track:
shekel_spent,shekel_limit,shekel_utilization,shekel_last_model - Works with track-only mode (no limit)
- Supports custom trace names and tags
Feature #2: Nested Budget Mapping¶
- Nested budgets automatically create span hierarchy in Langfuse
- Parent budget → trace, child budgets → child spans
- Perfect waterfall view for multi-stage workflows
- Each span has its own budget metadata
Feature #3: Circuit Break Events¶
- WARNING events created when budget limits exceeded
- Event metadata: spent, limit, overage, model, tokens, parent_remaining
- Nested budget violations create events on child spans
- Easy filtering and alerting in Langfuse UI
Feature #4: Fallback Annotations¶
- INFO events created when fallback model activates
- Event metadata: from_model, to_model, switched_at, costs, savings
- Trace/span metadata updated to show fallback is active
- Fallback info persists across subsequent cost updates
🏗️ Adapter Pattern Architecture¶
ObservabilityAdapterbase class for integrationsAdapterRegistryfor managing multiple adapters- Thread-safe registration and event broadcasting
- Error isolation (one adapter failure doesn't break others)
AsyncEventQueuefor non-blocking event delivery- Background worker thread processes events asynchronously
- Queue drops old events if full (no blocking)
- Graceful shutdown with timeout
📦 Optional Dependency Management¶
- New
shekel[langfuse]extra for Langfuse integration - Graceful import handling (works even if langfuse not installed)
Technical Details¶
- Core event emission in
_patch.py::_record()and_budget.py::_check_limit() - Type-safe implementation with guards for Python 3.9+ compatibility
- All 267 tests passing (65 new integration tests), 95%+ coverage
- Zero performance impact: <1ms overhead per LLM call
[0.2.3] - 2026-03-11¶
🌳 Nested Budgets (v0.2.3)¶
Hierarchical budget tracking for multi-stage AI workflows:
- Automatic spend propagation from child to parent on context exit
- Auto-capping: child budgets capped to parent's remaining budget
- Parent locking: parent cannot spend while child is active (sequential execution)
- Named budgets: names required when nesting for clear cost attribution
- Track-only children:
max_usd=Nonefor unlimited child tracking
📊 Rich Introspection API¶
budget.full_name— Hierarchical path (e.g.,"workflow.research.validation")budget.spent_direct— Direct spend by this budget (excluding children)budget.spent_by_children— Sum of all child spendbudget.parent— Reference to parent budget (Nonefor root)budget.children— List of child budgetsbudget.active_child— Currently active child budgetbudget.tree()— Visual hierarchy of budget tree with spend breakdown
🛡️ Safety Rails¶
- Maximum nesting depth of 5 levels enforced
- Async nesting detection (raises clear error — deferred to future version)
- Zero/negative budget validation at
__init__ - Spend propagation on all exceptions (money spent is money spent)
Changes¶
- Budget variables always accumulate across uses — same variable, same accumulated state
- Both parent and child must have a
namewhen creating nested contexts
Fixed¶
- ContextVar token management now uses proper
.reset()instead of manual.set(None) - Patch reference counting no longer leaks when validation errors occur before patching
- Sibling budgets must have unique names under the same parent