Changelog¶
All notable changes to this project are documented here. For detailed information, see CHANGELOG.md on GitHub.
[1.1.0]¶
OpenAI Agents SDK Adapter, Loop Guard, and Spend Velocity¶
Three new circuit-breaking primitives for production agent deployments.
OpenAI Agents SDK Runner adapter (shekel/providers/openai_agents.py): - Patches Runner.run, Runner.run_sync, and Runner.run_streamed transparently on budget().__enter__() and restores them on __exit__() - Per-agent caps via b.agent("name", max_usd=X) — AgentBudgetExceededError raised before the agent run starts when its cap is exhausted - Spend attribution visible in budget.tree() alongside per-agent caps and utilization percentages - Full async (Runner.run) and sync (Runner.run_sync) support; streaming spend attributed after iteration completes - Auto-skipped when openai-agents is not installed
Loop Guard — budget(loop_guard=True): - Per-tool rolling-window counter; tracks call timestamps per tool name at every pre-dispatch gate - AgentLoopError raised before the tool executes when a tool is called more than loop_guard_max_calls times within loop_guard_window_seconds - Configurable thresholds: loop_guard_max_calls (default 5), loop_guard_window_seconds (default 60.0; 0 = all-time) - warn_only=True support — logs a warning instead of raising; b.loop_guard_counts always populated - Independent of max_tool_calls — both enforced simultaneously; first to fire wins - Works with all auto-intercepted frameworks: LangChain, MCP, CrewAI, OpenAI Agents SDK, @tool
Spend Velocity — budget(max_velocity="$X/unit"): - Burn-rate circuit breaker checked on every LLM cost record - SpendVelocityExceededError raised when measured USD/min exceeds the configured limit - Spec DSL: "$<amount>/<unit>" — units: sec/s, min/m, hr/h/hour, day/d - warn_velocity soft threshold fires on_warn callback before the hard stop - Velocity-only mode supported (no max_usd required) - All velocity values normalized to USD/min in exceptions and callbacks - warn_only=True support — logs instead of raising
New exceptions (all subclass BudgetExceededError, exported from shekel.exceptions): - AgentLoopError — tool_name, call_count, window_seconds, usd_spent, framework - SpendVelocityExceededError — velocity_per_min, limit_per_min, window_seconds, usd_spent, elapsed_seconds
New Budget property: - loop_guard_counts: dict[str, int] — per-tool call counts for the current window; populated when loop_guard=True; empty dict otherwise
Documentation: - New integration page: OpenAI Agents SDK - New usage pages: Loop Guard · Spend Velocity
[1.0.2]¶
🎉 First GA Release — Hierarchical Budget Enforcement, CrewAI circuit breaking + Distributed Budgets¶
Per-node, per-chain, per-agent, and per-task USD caps with automatic LangGraph and LangChain instrumentation. Distributed enforcement via Redis for multi-process deployments. Zero code changes required — open a budget() context and run.
from shekel.backends.redis import RedisBackend
backend = RedisBackend() # reads REDIS_URL from env
with budget("$5/hr + 100 calls/hr", name="api", backend=backend) as b:
b.node("fetch_data", max_usd=0.50)
b.node("summarize", max_usd=1.00)
b.chain("retriever", max_usd=0.20)
app.invoke({"query": "..."})
print(b.tree())
# api: $0.84 / $5.00 (direct: $0.00)
# [node] fetch_data: $0.12 / $0.50 (24.0%)
# [node] summarize: $0.72 / $1.00 (72.0%)
# [chain] retriever: $0.00 / $0.20 (0.0%)
LangGraph adapter (shekel/providers/langgraph.py): - Patches StateGraph.add_node() transparently — every node gets a pre-execution budget gate, no graph changes needed - NodeBudgetExceededError raised before the node body runs when an explicit cap or the parent budget is exhausted - Per-node spend attributed to ComponentBudget._spent → visible in budget.tree() - Full async node support; auto-skipped when langgraph is not installed
LangChain adapter (shekel/providers/langchain.py): - Patches Runnable._call_with_config, _acall_with_config, and RunnableSequence.invoke/ainvoke - ChainBudgetExceededError raised before chain body runs when cap or parent budget is exhausted - Same reference-counting and nesting semantics as LangGraphAdapter - Auto-skipped when langchain_core is not installed
Distributed budgets (shekel/backends/redis.py): - RedisBackend / AsyncRedisBackend — atomic Lua-script enforcement (one round-trip) - Circuit breaker: configurable error threshold + cooldown before opening - Fail-closed (default) or fail-open on backend unavailability - BudgetConfigMismatchError when a budget name is reused with different limits/windows
Multi-cap temporal spec: - budget("$5/hr + 100 calls/hr", name="api") — simultaneous USD + call-count caps with independent rolling windows - Supported counters: usd, llm_calls, tool_calls, tokens
API (Budget methods, all chainable): - b.node(name, max_usd) — explicit cap for a LangGraph node - b.chain(name, max_usd) — explicit cap for a LangChain chain - b.agent(name, max_usd) — explicit cap for a CrewAI agent; raises AgentBudgetExceededError before execute - b.task(name, max_usd) — explicit cap for a CrewAI task; raises TaskBudgetExceededError before execute
Exception hierarchy (all subclass BudgetExceededError): - NodeBudgetExceededError — node_name, spent, limit - ChainBudgetExceededError — chain_name, spent, limit - AgentBudgetExceededError — agent_name, spent, limit - TaskBudgetExceededError — task_name, spent, limit - SessionBudgetExceededError — agent_name, spent, limit, window - BudgetConfigMismatchError — raised by Redis backend on config conflict
Fixed: Node and chain caps registered on an outer budget() are now correctly enforced inside inner nested budget contexts.
[0.2.9]¶
🖥️ CLI Budget Enforcement — shekel run¶
Run any Python agent with a hard USD cap from the command line — zero code changes required.
shekel run agent.py --budget 5 # hard cap at $5
shekel run agent.py --budget 5 --warn-at 0.8 # warn at 80%
AGENT_BUDGET_USD=5 shekel run agent.py # env var (Docker / CI)
shekel run SCRIPT [OPTIONS]— wraps any Python script in-process; shekel's monkey-patches are already active when the script runs--budget N/AGENT_BUDGET_USD=N— USD cap; env var enables Docker/CI operator control without code changes--warn-at F— warn fraction 0.0–1.0 (e.g.0.8= warn at 80%)--max-llm-calls N/--max-tool-calls N— count-based caps--warn-only— log warning, never exit 1; soft guardrail for dev environments--dry-run— track costs only, no enforcement; implies--warn-only--output json— machine-readable spend line for log pipelines--budget-file shekel.toml— load limits from a TOML config fileBudget(warn_only=True)— new parameter suppresses raises, fires warn callback instead- GitHub Actions composite action:
.github/actions/enforce/action.yml - New docs: CLI reference · Docker & Containers
- Exit code 1 on budget exceeded — works as a CI pipeline gate with zero pipeline config
[0.2.8]¶
🔧 Tool Budgets¶
Cap agent tool call count and cost — stop runaway tool loops before they bankrupt you.
max_tool_calls— hard cap on total dispatches, checked before each tool runstool_prices— per-tool USD cost; unknown tools count at$0toward the cap@tool/tool()decorator — one line for any sync/async function or callableToolBudgetExceededError—tool_name,calls_used,calls_limit,usd_spent,framework- Auto-interception: LangChain
BaseTool, MCPClientSession.call_tool, CrewAIBaseTool, OpenAI Agents SDKFunctionTool— zero config summary()extended with tool spend breakdown by tool name and framework- Four new OTel instruments:
shekel.tool.calls_total,shekel.tool.cost_usd_total,shekel.tool.budget_exceeded_total,shekel.tool.calls_remaining - 111 new unit tests (TDD)
⏱️ Temporal Budgets¶
Rolling-window LLM spend limits — enforce $5/hr per API tier, user, or agent.
budget("$5/hr", name="api-tier")— string DSLTemporalBudgetBackendProtocol — bring your own Redis/Postgres backendBudgetExceededErrorenriched withretry_afterandwindow_spenton_window_resetadapter event +shekel.budget.window_resets_totalOTel counter
[0.2.7]¶
📡 OpenTelemetry Metrics Integration¶
Shekel now exposes LLM cost and budget lifecycle data via OpenTelemetry — filling the gap the OTel GenAI semantic conventions leave around cost and budget metrics.
ShekelMeter (shekel/otel.py) — Zero-config public entry point
from shekel.otel import ShekelMeter
meter = ShekelMeter() # uses global MeterProvider
# or
meter = ShekelMeter(meter_provider=my_provider, emit_tokens=True)
meter.unregister() # remove from registry when done
Silent no-op when opentelemetry-api is not installed (meter.is_noop is True).
8 new instruments:
| Instrument | Type | Description |
|---|---|---|
shekel.llm.cost_usd | Counter | Cost per LLM call |
shekel.llm.calls_total | Counter | Call count per model |
shekel.llm.tokens_input_total | Counter | Input tokens (opt-in) |
shekel.llm.tokens_output_total | Counter | Output tokens (opt-in) |
shekel.budget.exits_total | Counter | Budget exits by status |
shekel.budget.cost_usd | UpDownCounter | Cumulative spend per budget |
shekel.budget.utilization | Histogram | 0.0–1.0 on exit |
shekel.budget.spend_rate | Histogram | USD/s on exit |
shekel.budget.fallbacks_total | Counter | Fallback activations |
shekel.budget.autocaps_total | Counter | Auto-cap events |
Two new ObservabilityAdapter events:
on_budget_exit(data)— fires on every budget context exit (before parent propagation);dataincludesstatus,spent_usd,utilization,duration_seconds,calls_made,model_switched,from_model,to_modelon_autocap(data)— fires when a child budget is silently reduced by the parent's remaining;dataincludeschild_name,parent_name,original_limit,effective_limit
Token payload in on_cost_update — the event now includes input_tokens and output_tokens fields.
Install: pip install shekel[otel]
See OTel Integration Guide for PromQL examples, cardinality guidance, and Grafana hints.
Full Async Support for Gemini, HuggingFace, and Nested Budgets¶
async with budget(...): now supports the same full nesting logic as sync contexts — auto-capping, spend propagation, and per-task isolation via ContextVar all work identically.
Gemini async — _gemini_async_wrapper and _gemini_async_stream_wrapper added; AsyncModels patched automatically alongside sync Models.
HuggingFace async — _huggingface_async_wrapper and _wrap_huggingface_stream_async added; AsyncInferenceClient patched automatically.
# Now works — full nesting with async context managers
async with budget(max_usd=10.00, name="workflow"):
async with budget(max_usd=2.00, name="research"):
result = await client.models.generate_content_async(...)
13 new async unit tests added for Gemini and HuggingFace. Integration test suites expanded with async streaming and async with budget() scenarios for OpenAI and Anthropic.
[0.2.6]¶
New Features¶
max_llm_calls — limit budgets by call count
budget(max_llm_calls=50)raisesBudgetExceededErrorafter 50 LLM API calls- Can be combined with
max_usd:budget(max_usd=1.00, max_llm_calls=20) - Works with fallback:
budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"})
LiteLLM provider adapter
- Install with
pip install shekel[litellm] - Patches
litellm.completionandlitellm.acompletion(sync + async, including streaming) - Enforces budgets and circuit-breaks across all 100+ providers LiteLLM supports (Gemini, Cohere, Ollama, Azure, Bedrock, Mistral, and more)
- Model names with provider prefix (e.g.
gemini/gemini-1.5-flash) pass through to the pricing engine
[0.2.5] - 2026-03-11¶
🔧 Extensible Provider Architecture¶
Shekel now has a pluggable architecture for adding new LLM provider support without modifying core code.
ProviderAdapter — Standard interface for any LLM provider - 8 abstract methods: patching, token extraction, streaming, validation - All providers (OpenAI, Anthropic, custom) implement this interface - Clear contract for what shekel needs from a provider
ProviderRegistry — Central hub for provider management - Thread-safe registration and lifecycle management - Automatic patch installation and removal - Provider discovery by name for fallback validation - Decoupled from core code — no core changes needed for new providers
Add your own provider in 3 steps: 1. Implement ProviderAdapter interface 2. Register with ADAPTER_REGISTRY.register(YourAdapter()) 3. Works everywhere automatically
Community can now add: Cohere, Replicate, vLLM, Mistral, Bedrock, Vertex AI, and others
✅ Validated with Real Integration Tests¶
The architecture is battle-tested with comprehensive end-to-end validation:
Groq API — 25+ integration tests - Custom pricing and budget enforcement - Nested budgets and cost attribution - Streaming responses and concurrent calls - Rate limiting and error handling - Real API keys in CI
Google Gemini API — 30+ integration tests - Multi-turn conversations and streaming - JSON mode and function calling - Token counting accuracy - Budget enforcement and fallback - Real API keys in CI
These test suites serve as reference implementations showing how to build a provider adapter.
⚙️ Production-Grade Reliability¶
- Exponential backoff retry logic — Gracefully handles rate limiting and transient failures
- 100+ integration test scenarios — Comprehensive validation of architecture under load
- Concurrent test stability — Reduced flakiness when multiple providers are tested simultaneously
- CI improvements — Integration and performance tests run in parallel
✅ Quality Improvements¶
- 100+ integration test scenarios — Comprehensive real API coverage
- mkdocs integrity checks — Prevent broken documentation links in CI
- Better provider abstraction — Easier to add new LLM provider support in the future
[0.2.4] - 2026-03-11¶
✨ Langfuse Integration (New!)¶
Full LLM observability with zero configuration. See Langfuse Integration Guide for complete documentation.
Feature #1: Real-Time Cost Streaming¶
- Automatic metadata updates after each LLM call
- Track:
shekel_spent,shekel_limit,shekel_utilization,shekel_last_model - Works with track-only mode (no limit)
- Supports custom trace names and tags
Feature #2: Nested Budget Mapping¶
- Nested budgets automatically create span hierarchy in Langfuse
- Parent budget → trace, child budgets → child spans
- Perfect waterfall view for multi-stage workflows
- Each span has its own budget metadata
Feature #3: Circuit Break Events¶
- WARNING events created when budget limits exceeded
- Event metadata: spent, limit, overage, model, tokens, parent_remaining
- Nested budget violations create events on child spans
- Easy filtering and alerting in Langfuse UI
Feature #4: Fallback Annotations¶
- INFO events created when fallback model activates
- Event metadata: from_model, to_model, switched_at, costs, savings
- Trace/span metadata updated to show fallback is active
- Fallback info persists across subsequent cost updates
🏗️ Adapter Pattern Architecture¶
ObservabilityAdapterbase class for integrationsAdapterRegistryfor managing multiple adapters- Thread-safe registration and event broadcasting
- Error isolation (one adapter failure doesn't break others)
AsyncEventQueuefor non-blocking event delivery- Background worker thread processes events asynchronously
- Queue drops old events if full (no blocking)
- Graceful shutdown with timeout
📦 Optional Dependency Management¶
- New
shekel[langfuse]extra for Langfuse integration - Graceful import handling (works even if langfuse not installed)
Technical Details¶
- Core event emission in
_patch.py::_record()and_budget.py::_check_limit() - Type-safe implementation with guards for Python 3.9+ compatibility
- All 267 tests passing (65 new integration tests), 95%+ coverage
- Zero performance impact: <1ms overhead per LLM call
[0.2.3] - 2026-03-11¶
🌳 Nested Budgets (v0.2.3)¶
Hierarchical budget tracking for multi-stage AI workflows:
- Automatic spend propagation from child to parent on context exit
- Auto-capping: child budgets capped to parent's remaining budget
- Parent locking: parent cannot spend while child is active (sequential execution)
- Named budgets: names required when nesting for clear cost attribution
- Track-only children:
max_usd=Nonefor unlimited child tracking
📊 Rich Introspection API¶
budget.full_name— Hierarchical path (e.g.,"workflow.research.validation")budget.spent_direct— Direct spend by this budget (excluding children)budget.spent_by_children— Sum of all child spendbudget.parent— Reference to parent budget (Nonefor root)budget.children— List of child budgetsbudget.active_child— Currently active child budgetbudget.tree()— Visual hierarchy of budget tree with spend breakdown
🛡️ Safety Rails¶
- Maximum nesting depth of 5 levels enforced
- Async nesting support (nested
async with budget()contexts with same rules as sync) - Zero/negative budget validation at
__init__ - Spend propagation on all exceptions (money spent is money spent)
Changes¶
- Budget variables always accumulate across uses — same variable, same accumulated state
- Both parent and child must have a
namewhen creating nested contexts
Fixed¶
- ContextVar token management now uses proper
.reset()instead of manual.set(None) - Patch reference counting no longer leaks when validation errors occur before patching
- Sibling budgets must have unique names under the same parent