Changelog¶
All notable changes to this project are documented here. For detailed information, see CHANGELOG.md on GitHub.
[0.2.9]¶
๐ฅ๏ธ CLI Budget Enforcement โ shekel run¶
Run any Python agent with a hard USD cap from the command line โ zero code changes required.
shekel run agent.py --budget 5 # hard cap at $5
shekel run agent.py --budget 5 --warn-at 0.8 # warn at 80%
AGENT_BUDGET_USD=5 shekel run agent.py # env var (Docker / CI)
shekel run SCRIPT [OPTIONS]โ wraps any Python script in-process; shekel's monkey-patches are already active when the script runs--budget N/AGENT_BUDGET_USD=Nโ USD cap; env var enables Docker/CI operator control without code changes--warn-at Fโ warn fraction 0.0โ1.0 (e.g.0.8= warn at 80%)--max-llm-calls N/--max-tool-calls Nโ count-based caps--warn-onlyโ log warning, never exit 1; soft guardrail for dev environments--dry-runโ track costs only, no enforcement; implies--warn-only--output jsonโ machine-readable spend line for log pipelines--budget-file shekel.tomlโ load limits from a TOML config fileBudget(warn_only=True)โ new parameter suppresses raises, fires warn callback instead- GitHub Actions composite action:
.github/actions/enforce/action.yml - New docs: CLI reference ยท Docker & Containers
- Exit code 1 on budget exceeded โ works as a CI pipeline gate with zero pipeline config
[0.2.8]¶
๐ง Tool Budgets¶
Cap agent tool call count and cost โ stop runaway tool loops before they bankrupt you.
max_tool_callsโ hard cap on total dispatches, checked before each tool runstool_pricesโ per-tool USD cost; unknown tools count at$0toward the cap@tool/tool()decorator โ one line for any sync/async function or callableToolBudgetExceededErrorโtool_name,calls_used,calls_limit,usd_spent,framework- Auto-interception: LangChain
BaseTool, MCPClientSession.call_tool, CrewAIBaseTool, OpenAI Agents SDKFunctionToolโ zero config summary()extended with tool spend breakdown by tool name and framework- Four new OTel instruments:
shekel.tool.calls_total,shekel.tool.cost_usd_total,shekel.tool.budget_exceeded_total,shekel.tool.calls_remaining - 111 new unit tests (TDD)
โฑ๏ธ Temporal Budgets¶
Rolling-window LLM spend limits โ enforce $5/hr per API tier, user, or agent.
budget("$5/hr", name="api-tier")โ string DSLTemporalBudgetBackendProtocol โ bring your own Redis/Postgres backendBudgetExceededErrorenriched withretry_afterandwindow_spenton_window_resetadapter event +shekel.budget.window_resets_totalOTel counter
[0.2.7]¶
๐ก OpenTelemetry Metrics Integration¶
Shekel now exposes LLM cost and budget lifecycle data via OpenTelemetry โ filling the gap the OTel GenAI semantic conventions leave around cost and budget metrics.
ShekelMeter (shekel/otel.py) โ Zero-config public entry point
from shekel.otel import ShekelMeter
meter = ShekelMeter() # uses global MeterProvider
# or
meter = ShekelMeter(meter_provider=my_provider, emit_tokens=True)
meter.unregister() # remove from registry when done
Silent no-op when opentelemetry-api is not installed (meter.is_noop is True).
8 new instruments:
| Instrument | Type | Description |
|---|---|---|
shekel.llm.cost_usd |
Counter | Cost per LLM call |
shekel.llm.calls_total |
Counter | Call count per model |
shekel.llm.tokens_input_total |
Counter | Input tokens (opt-in) |
shekel.llm.tokens_output_total |
Counter | Output tokens (opt-in) |
shekel.budget.exits_total |
Counter | Budget exits by status |
shekel.budget.cost_usd |
UpDownCounter | Cumulative spend per budget |
shekel.budget.utilization |
Histogram | 0.0โ1.0 on exit |
shekel.budget.spend_rate |
Histogram | USD/s on exit |
shekel.budget.fallbacks_total |
Counter | Fallback activations |
shekel.budget.autocaps_total |
Counter | Auto-cap events |
Two new ObservabilityAdapter events:
on_budget_exit(data)โ fires on every budget context exit (before parent propagation);dataincludesstatus,spent_usd,utilization,duration_seconds,calls_made,model_switched,from_model,to_modelon_autocap(data)โ fires when a child budget is silently reduced by the parent's remaining;dataincludeschild_name,parent_name,original_limit,effective_limit
Token payload in on_cost_update โ the event now includes input_tokens and output_tokens fields.
Install: pip install shekel[otel]
See OTel Integration Guide for PromQL examples, cardinality guidance, and Grafana hints.
Full Async Support for Gemini, HuggingFace, and Nested Budgets¶
async with budget(...): now supports the same full nesting logic as sync contexts โ auto-capping, spend propagation, and per-task isolation via ContextVar all work identically.
Gemini async โ _gemini_async_wrapper and _gemini_async_stream_wrapper added; AsyncModels patched automatically alongside sync Models.
HuggingFace async โ _huggingface_async_wrapper and _wrap_huggingface_stream_async added; AsyncInferenceClient patched automatically.
# Now works โ full nesting with async context managers
async with budget(max_usd=10.00, name="workflow"):
async with budget(max_usd=2.00, name="research"):
result = await client.models.generate_content_async(...)
13 new async unit tests added for Gemini and HuggingFace. Integration test suites expanded with async streaming and async with budget() scenarios for OpenAI and Anthropic.
[0.2.6]¶
New Features¶
max_llm_calls โ limit budgets by call count
budget(max_llm_calls=50)raisesBudgetExceededErrorafter 50 LLM API calls- Can be combined with
max_usd:budget(max_usd=1.00, max_llm_calls=20) - Works with fallback:
budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"})
LiteLLM provider adapter
- Install with
pip install shekel[litellm] - Patches
litellm.completionandlitellm.acompletion(sync + async, including streaming) - Enforces budgets and circuit-breaks across all 100+ providers LiteLLM supports (Gemini, Cohere, Ollama, Azure, Bedrock, Mistral, and more)
- Model names with provider prefix (e.g.
gemini/gemini-1.5-flash) pass through to the pricing engine
[0.2.5] - 2026-03-11¶
๐ง Extensible Provider Architecture¶
Shekel now has a pluggable architecture for adding new LLM provider support without modifying core code.
ProviderAdapter โ Standard interface for any LLM provider - 8 abstract methods: patching, token extraction, streaming, validation - All providers (OpenAI, Anthropic, custom) implement this interface - Clear contract for what shekel needs from a provider
ProviderRegistry โ Central hub for provider management - Thread-safe registration and lifecycle management - Automatic patch installation and removal - Provider discovery by name for fallback validation - Decoupled from core code โ no core changes needed for new providers
Add your own provider in 3 steps:
1. Implement ProviderAdapter interface
2. Register with ADAPTER_REGISTRY.register(YourAdapter())
3. Works everywhere automatically
Community can now add: Cohere, Replicate, vLLM, Mistral, Bedrock, Vertex AI, and others
โ Validated with Real Integration Tests¶
The architecture is battle-tested with comprehensive end-to-end validation:
Groq API โ 25+ integration tests - Custom pricing and budget enforcement - Nested budgets and cost attribution - Streaming responses and concurrent calls - Rate limiting and error handling - Real API keys in CI
Google Gemini API โ 30+ integration tests - Multi-turn conversations and streaming - JSON mode and function calling - Token counting accuracy - Budget enforcement and fallback - Real API keys in CI
These test suites serve as reference implementations showing how to build a provider adapter.
โ๏ธ Production-Grade Reliability¶
- Exponential backoff retry logic โ Gracefully handles rate limiting and transient failures
- 100+ integration test scenarios โ Comprehensive validation of architecture under load
- Concurrent test stability โ Reduced flakiness when multiple providers are tested simultaneously
- CI improvements โ Integration and performance tests run in parallel
โ Quality Improvements¶
- 100+ integration test scenarios โ Comprehensive real API coverage
- mkdocs integrity checks โ Prevent broken documentation links in CI
- Better provider abstraction โ Easier to add new LLM provider support in the future
[0.2.4] - 2026-03-11¶
โจ Langfuse Integration (New!)¶
Full LLM observability with zero configuration. See Langfuse Integration Guide for complete documentation.
Feature #1: Real-Time Cost Streaming¶
- Automatic metadata updates after each LLM call
- Track:
shekel_spent,shekel_limit,shekel_utilization,shekel_last_model - Works with track-only mode (no limit)
- Supports custom trace names and tags
Feature #2: Nested Budget Mapping¶
- Nested budgets automatically create span hierarchy in Langfuse
- Parent budget โ trace, child budgets โ child spans
- Perfect waterfall view for multi-stage workflows
- Each span has its own budget metadata
Feature #3: Circuit Break Events¶
- WARNING events created when budget limits exceeded
- Event metadata: spent, limit, overage, model, tokens, parent_remaining
- Nested budget violations create events on child spans
- Easy filtering and alerting in Langfuse UI
Feature #4: Fallback Annotations¶
- INFO events created when fallback model activates
- Event metadata: from_model, to_model, switched_at, costs, savings
- Trace/span metadata updated to show fallback is active
- Fallback info persists across subsequent cost updates
๐๏ธ Adapter Pattern Architecture¶
ObservabilityAdapterbase class for integrationsAdapterRegistryfor managing multiple adapters- Thread-safe registration and event broadcasting
- Error isolation (one adapter failure doesn't break others)
AsyncEventQueuefor non-blocking event delivery- Background worker thread processes events asynchronously
- Queue drops old events if full (no blocking)
- Graceful shutdown with timeout
๐ฆ Optional Dependency Management¶
- New
shekel[langfuse]extra for Langfuse integration - Graceful import handling (works even if langfuse not installed)
Technical Details¶
- Core event emission in
_patch.py::_record()and_budget.py::_check_limit() - Type-safe implementation with guards for Python 3.9+ compatibility
- All 267 tests passing (65 new integration tests), 95%+ coverage
- Zero performance impact: <1ms overhead per LLM call
[0.2.3] - 2026-03-11¶
๐ณ Nested Budgets (v0.2.3)¶
Hierarchical budget tracking for multi-stage AI workflows:
- Automatic spend propagation from child to parent on context exit
- Auto-capping: child budgets capped to parent's remaining budget
- Parent locking: parent cannot spend while child is active (sequential execution)
- Named budgets: names required when nesting for clear cost attribution
- Track-only children:
max_usd=Nonefor unlimited child tracking
๐ Rich Introspection API¶
budget.full_nameโ Hierarchical path (e.g.,"workflow.research.validation")budget.spent_directโ Direct spend by this budget (excluding children)budget.spent_by_childrenโ Sum of all child spendbudget.parentโ Reference to parent budget (Nonefor root)budget.childrenโ List of child budgetsbudget.active_childโ Currently active child budgetbudget.tree()โ Visual hierarchy of budget tree with spend breakdown
๐ก๏ธ Safety Rails¶
- Maximum nesting depth of 5 levels enforced
- Async nesting support (nested
async with budget()contexts with same rules as sync) - Zero/negative budget validation at
__init__ - Spend propagation on all exceptions (money spent is money spent)
Changes¶
- Budget variables always accumulate across uses โ same variable, same accumulated state
- Both parent and child must have a
namewhen creating nested contexts
Fixed¶
- ContextVar token management now uses proper
.reset()instead of manual.set(None) - Patch reference counting no longer leaks when validation errors occur before patching
- Sibling budgets must have unique names under the same parent