Shekel¶
LLM budget enforcement and cost tracking for Python. One line. Zero config.
The Story¶
I spent $47 debugging a LangGraph retry loop. The agent kept failing, LangGraph kept retrying, and OpenAI kept charging — all while I slept.
I built shekel so you don't have to learn that lesson yourself.
Features¶
-
Zero Config
One line of code. No API keys, no external services, no setup.
-
Budget Enforcement
Hard caps, soft warnings, or track-only mode. You control the spend.
-
Smart Fallback
Automatically switch to cheaper models instead of crashing.
-
Nested Budgets
Hierarchical tracking for multi-stage workflows.
-
Langfuse Integration
Circuit-break events, per-call spend streaming, and budget hierarchy in Langfuse — see exactly where your budget breaks.
-
Framework Agnostic
Works with LangGraph, CrewAI, AutoGen, LlamaIndex, Haystack, and any framework that calls OpenAI, Anthropic, or LiteLLM.
-
Async & Streaming
Full support for async/await patterns and streaming responses.
Quick Start¶
Installation¶
Basic Usage¶
from shekel import budget, BudgetExceededError
# Enforce a hard cap
try:
with budget(max_usd=1.00, warn_at=0.8) as b:
run_my_agent()
print(f"Spent ${b.spent:.4f}")
except BudgetExceededError as e:
print(e)
# Track spend without enforcing a limit
with budget() as b:
run_my_agent()
print(f"Cost: ${b.spent:.4f}")
# Decorator
from shekel import with_budget
@with_budget(max_usd=0.10)
def call_llm():
...
See It In Action¶
import openai
from shekel import budget
client = openai.OpenAI()
with budget(max_usd=0.10) as b:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
print(f"Total cost: ${b.spent:.4f}")
print(f"Remaining: ${b.remaining:.4f}")
Why Shekel?¶
| Problem | Solution |
|---|---|
| Agent retry loops drain your wallet | Hard budget caps stop runaway costs |
| No visibility into LLM spending | Track every API call automatically |
| Expensive models blow your budget | Automatic fallback to cheaper models |
| Need to enforce spend limits | Context manager raises on budget exceeded |
| Multi-step workflows need session budgets | Budgets always accumulate across runs |
What's New in v0.2.6¶
-
fallback={"at_pct": 0.8, "model": "gpt-4o-mini"}— automatically switch to a cheaper model instead of crashing. Fallback shares the samemax_usdbudget. -
on_warncallback fires atwarn_atthreshold before the budget is exhausted. -
max_llm_calls=50caps by number of LLM API calls, combinable withmax_usd. -
Native adapter for LiteLLM — hard budget caps and circuit-breaking across 100+ providers (Gemini, Cohere, Ollama, Azure, Bedrock…). One limit, every provider.
-
Native adapter for the
google-genaiSDK — enforce budgets ongenerate_contentand streaming. Pricing bundled for Gemini 2.0 Flash, 2.5 Flash, and 2.5 Pro. -
Native adapter for
huggingface-hub— budget enforcement for any model on the HuggingFace Inference API, sync and streaming.
What's Next?¶
-
Get up and running in 5 minutes with step-by-step examples.
-
Learn about all the features: enforcement, fallbacks, streaming, and more.
-
Complete documentation of all parameters, properties, and methods.
-
See how to use shekel with LangGraph, CrewAI, and other frameworks.
Supported Models¶
Built-in pricing for GPT-4o, GPT-4o-mini, o1, Claude 3.5 Sonnet, Claude 3 Haiku, Gemini 1.5, and more.
Install shekel[litellm] to enforce hard spend limits across 100+ providers through LiteLLM's unified interface.
Install shekel[all-models] for 400+ models via tokencost.
Community¶
- GitHub: arieradle/shekel
- PyPI: pypi.org/project/shekel
- Issues: github.com/arieradle/shekel/issues
- Contributing: See our guide
License¶
MIT License - see LICENSE for details.