Skip to content

Changelog

All notable changes to this project are documented here. For detailed information, see CHANGELOG.md on GitHub.

[0.2.6]

New Features

max_llm_calls — limit budgets by call count

  • budget(max_llm_calls=50) raises BudgetExceededError after 50 LLM API calls
  • Can be combined with max_usd: budget(max_usd=1.00, max_llm_calls=20)
  • Works with fallback: budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"})

LiteLLM provider adapter

  • Install with pip install shekel[litellm]
  • Patches litellm.completion and litellm.acompletion (sync + async, including streaming)
  • Enforces budgets and circuit-breaks across all 100+ providers LiteLLM supports (Gemini, Cohere, Ollama, Azure, Bedrock, Mistral, and more)
  • Model names with provider prefix (e.g. gemini/gemini-1.5-flash) pass through to the pricing engine

[0.2.5] - 2026-03-11

🔧 Extensible Provider Architecture

Shekel now has a pluggable architecture for adding new LLM provider support without modifying core code.

ProviderAdapter — Standard interface for any LLM provider - 8 abstract methods: patching, token extraction, streaming, validation - All providers (OpenAI, Anthropic, custom) implement this interface - Clear contract for what shekel needs from a provider

ProviderRegistry — Central hub for provider management - Thread-safe registration and lifecycle management - Automatic patch installation and removal - Provider discovery by name for fallback validation - Decoupled from core code — no core changes needed for new providers

Add your own provider in 3 steps: 1. Implement ProviderAdapter interface 2. Register with ADAPTER_REGISTRY.register(YourAdapter()) 3. Works everywhere automatically

Community can now add: Cohere, Replicate, vLLM, Mistral, Bedrock, Vertex AI, and others

✅ Validated with Real Integration Tests

The architecture is battle-tested with comprehensive end-to-end validation:

Groq API — 25+ integration tests - Custom pricing and budget enforcement - Nested budgets and cost attribution - Streaming responses and concurrent calls - Rate limiting and error handling - Real API keys in CI

Google Gemini API — 30+ integration tests - Multi-turn conversations and streaming - JSON mode and function calling - Token counting accuracy - Budget enforcement and fallback - Real API keys in CI

These test suites serve as reference implementations showing how to build a provider adapter.

⚙️ Production-Grade Reliability

  • Exponential backoff retry logic — Gracefully handles rate limiting and transient failures
  • 100+ integration test scenarios — Comprehensive validation of architecture under load
  • Concurrent test stability — Reduced flakiness when multiple providers are tested simultaneously
  • CI improvements — Integration and performance tests run in parallel

✅ Quality Improvements

  • 100+ integration test scenarios — Comprehensive real API coverage
  • mkdocs integrity checks — Prevent broken documentation links in CI
  • Better provider abstraction — Easier to add new LLM provider support in the future

[0.2.4] - 2026-03-11

✨ Langfuse Integration (New!)

Full LLM observability with zero configuration. See Langfuse Integration Guide for complete documentation.

Feature #1: Real-Time Cost Streaming

  • Automatic metadata updates after each LLM call
  • Track: shekel_spent, shekel_limit, shekel_utilization, shekel_last_model
  • Works with track-only mode (no limit)
  • Supports custom trace names and tags

Feature #2: Nested Budget Mapping

  • Nested budgets automatically create span hierarchy in Langfuse
  • Parent budget → trace, child budgets → child spans
  • Perfect waterfall view for multi-stage workflows
  • Each span has its own budget metadata

Feature #3: Circuit Break Events

  • WARNING events created when budget limits exceeded
  • Event metadata: spent, limit, overage, model, tokens, parent_remaining
  • Nested budget violations create events on child spans
  • Easy filtering and alerting in Langfuse UI

Feature #4: Fallback Annotations

  • INFO events created when fallback model activates
  • Event metadata: from_model, to_model, switched_at, costs, savings
  • Trace/span metadata updated to show fallback is active
  • Fallback info persists across subsequent cost updates

🏗️ Adapter Pattern Architecture

  • ObservabilityAdapter base class for integrations
  • AdapterRegistry for managing multiple adapters
  • Thread-safe registration and event broadcasting
  • Error isolation (one adapter failure doesn't break others)
  • AsyncEventQueue for non-blocking event delivery
  • Background worker thread processes events asynchronously
  • Queue drops old events if full (no blocking)
  • Graceful shutdown with timeout

📦 Optional Dependency Management

  • New shekel[langfuse] extra for Langfuse integration
  • Graceful import handling (works even if langfuse not installed)

Technical Details

  • Core event emission in _patch.py::_record() and _budget.py::_check_limit()
  • Type-safe implementation with guards for Python 3.9+ compatibility
  • All 267 tests passing (65 new integration tests), 95%+ coverage
  • Zero performance impact: <1ms overhead per LLM call

[0.2.3] - 2026-03-11

🌳 Nested Budgets (v0.2.3)

Hierarchical budget tracking for multi-stage AI workflows:

  • Automatic spend propagation from child to parent on context exit
  • Auto-capping: child budgets capped to parent's remaining budget
  • Parent locking: parent cannot spend while child is active (sequential execution)
  • Named budgets: names required when nesting for clear cost attribution
  • Track-only children: max_usd=None for unlimited child tracking

📊 Rich Introspection API

  • budget.full_name — Hierarchical path (e.g., "workflow.research.validation")
  • budget.spent_direct — Direct spend by this budget (excluding children)
  • budget.spent_by_children — Sum of all child spend
  • budget.parent — Reference to parent budget (None for root)
  • budget.children — List of child budgets
  • budget.active_child — Currently active child budget
  • budget.tree() — Visual hierarchy of budget tree with spend breakdown

🛡️ Safety Rails

  • Maximum nesting depth of 5 levels enforced
  • Async nesting detection (raises clear error — deferred to future version)
  • Zero/negative budget validation at __init__
  • Spend propagation on all exceptions (money spent is money spent)

Changes

  • Budget variables always accumulate across uses — same variable, same accumulated state
  • Both parent and child must have a name when creating nested contexts

Fixed

  • ContextVar token management now uses proper .reset() instead of manual .set(None)
  • Patch reference counting no longer leaks when validation errors occur before patching
  • Sibling budgets must have unique names under the same parent