API Reference¶
Complete reference for all shekel APIs.
budget()¶
Context manager for tracking and enforcing LLM API budgets.
Signature¶
def budget(
max_usd: float | None = None,
warn_at: float | None = None,
on_warn: Callable[[float, float], None] | None = None,
price_per_1k_tokens: dict[str, float] | None = None,
fallback: dict[str, Any] | None = None,
on_fallback: Callable[[float, float, str], None] | None = None,
name: str | None = None,
max_llm_calls: int | None = None,
) -> Budget
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
max_usd | float \| None | None | Maximum spend in USD. None = track-only mode (no enforcement). Must be positive if set. |
warn_at | float \| None | None | Fraction (0.0-1.0) of max_usd at which to warn. |
on_warn | Callable[[float, float], None] \| None | None | Callback fired at warn_at threshold. Receives (spent, limit). |
price_per_1k_tokens | dict[str, float] \| None | None | Override pricing: {"input": X, "output": Y} per 1k tokens. |
fallback | dict[str, Any] \| None | None | Dict specifying when and what model to switch to: {"at_pct": 0.8, "model": "gpt-4o-mini"}. at_pct is the fraction of max_usd at which to switch; model is the fallback model (same provider only). Fallback shares the same max_usd budget — there is no separate ceiling. |
on_fallback | Callable[[float, float, str], None] \| None | None | Callback on fallback switch. Receives (spent, limit, fallback_model). |
name | str \| None | None | Budget name for debugging and cost attribution. Required when nesting budgets. |
max_llm_calls | int \| None | None | Maximum number of LLM API calls. Raises BudgetExceededError when exceeded. Can be combined with max_usd. |
loop_guard | bool | False | Enable per-tool rolling-window loop detection. Raises AgentLoopError when the same tool is called more than loop_guard_max_calls times within loop_guard_window_seconds. |
loop_guard_max_calls | int | 5 | Max calls to the same tool within the window before AgentLoopError. Only applies when loop_guard=True. |
loop_guard_window_seconds | float | 60.0 | Rolling window duration in seconds. 0 = all-time cap (no rolling window). Only applies when loop_guard=True. |
max_velocity | str \| None | None | Spend velocity cap. Format: "$<amount>/<unit>" (e.g. "$0.50/min", "$5/hr"). Raises SpendVelocityExceededError when the burn rate exceeds this threshold. |
warn_velocity | str \| None | None | Soft velocity warning threshold. Same format as max_velocity. Must be less than max_velocity. Fires on_warn callback when crossed; does not raise. |
tenant_id | str \| None | None | Tenant or user identifier for per-tenant spend isolation. When set, Redis state is namespaced under shekel:tb:{name}:{tenant_id}. Requires name and backend. Empty string raises ValueError. |
backend | RedisBackend \| AsyncRedisBackend \| None | None | Redis backend for distributed or per-tenant enforcement. Required when tenant_id is set. |
window_seconds | float \| None | None | Rolling-window duration in seconds. Required (or inferred from a spec string) for temporal budgets. Default when tenant_id is set: 86400 * 30 (30 days). |
Returns¶
Budget object that can be used as a context manager.
Examples¶
Track-Only Mode¶
Budget Enforcement¶
Early Warning¶
Custom Warning Callback¶
def my_handler(spent: float, limit: float):
print(f"Alert: ${spent:.2f} / ${limit:.2f}")
with budget(max_usd=10.00, warn_at=0.8, on_warn=my_handler):
run_agent()
Model Fallback¶
Call-Count Budget¶
Combined USD and Call-Count Budget¶
with budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"}) as b:
run_agent()
Accumulating Budget¶
# Budget variables accumulate across uses
session = budget(max_usd=10.00, name="session")
with session:
process_batch_1()
with session:
process_batch_2() # Accumulates automatically
print(f"Total: ${session.spent:.2f}")
Custom Pricing¶
Nested Budgets¶
with budget(max_usd=10.00, name="workflow") as workflow:
# Research stage: $2 budget
with budget(max_usd=2.00, name="research"):
run_research()
# Analysis stage: $5 budget
with budget(max_usd=5.00, name="analysis"):
run_analysis()
# Parent can spend too
finalize()
print(f"Total: ${workflow.spent:.2f}")
print(workflow.tree())
See full nested budgets guide →
Budget Class¶
The budget context manager object.
Properties¶
| Property | Type | Description |
|---|---|---|
spent | float | Total USD spent in this budget context (includes children in nested budgets). |
remaining | float \| None | Remaining USD budget (based on effective limit), or None if track-only mode. |
limit | float \| None | Effective budget limit (auto-capped if nested), or None if track-only. |
name | str \| None | Budget name. |
model_switched | bool | True if fallback was activated. |
switched_at_usd | float \| None | USD spent when fallback occurred, or None. |
fallback_spent | float | USD spent on the fallback model. |
loop_guard_counts | dict[str, int] | Per-tool call counts recorded by the loop guard. Empty dict when loop_guard=False. Keys are tool names; values are total calls recorded within the current window. |
tenant_id | str \| None | Tenant identifier passed to budget(), or None if not set. |
Nested Budget Properties¶
| Property | Type | Description |
|---|---|---|
parent | Budget \| None | Parent budget, or None if root budget. |
children | list[Budget] | List of child budgets created under this budget. |
active_child | Budget \| None | Currently active child budget, or None. |
full_name | str | Hierarchical path name (e.g., "workflow.research.validation"). |
spent_direct | float | Direct spend by this budget only (excluding children). |
spent_by_children | float | Sum of spend from all child budgets. |
Methods¶
reset()¶
Reset spend tracking to zero. Only works when budget is not active.
session = budget(max_usd=10.00, name="session")
with session:
process()
session.reset() # Back to $0
with session:
process_again()
Raises: RuntimeError if called inside an active with block.
summary()¶
Return formatted spend summary as a string.
Returns: Multi-line string with formatted table of calls, costs, and totals.
summary_data()¶
Return structured spend data as a dict.
with budget() as b:
run_agent()
data = b.summary_data()
print(data["total_spent"])
print(data["total_calls"])
print(data["by_model"])
Returns: Dictionary with keys: - total_spent: Total USD - limit: Budget limit - model_switched: Boolean - switched_at_usd: Switch point - fallback_model: Fallback model name - fallback_spent: Cost on fallback - total_calls: Number of API calls - calls: List of all call records - by_model: Aggregated stats per model
tree()¶
Return visual hierarchy of budget tree with spend breakdown.
with budget(max_usd=20, name="workflow") as w:
with budget(max_usd=5, name="research"):
research()
with budget(max_usd=10, name="analysis"):
analyze()
print(w.tree())
# workflow: $12.50 / $20.00 (direct: $0.00)
# research: $3.20 / $5.00 (direct: $3.20)
# analysis: $9.30 / $10.00 (direct: $9.30)
Also renders registered component budgets (nodes, agents, tasks). LangGraph node spend is tracked automatically; agent/task spend requires future framework adapters.
with budget(max_usd=10, name="workflow") as b:
b.node("fetch", max_usd=0.50)
b.node("summarize", max_usd=1.00)
run_langgraph_workflow()
print(b.tree())
# workflow: $0.84 / $10.00 (direct: $0.00)
# [node] fetch: $0.12 / $0.50 (24.0%)
# [node] summarize: $0.72 / $1.00 (72.0%)
Returns: Multi-line string with indented tree structure showing: - Budget name and hierarchy - Total spend / limit - Direct spend (excluding children) - [ACTIVE] marker for currently active children - [node], [agent], [task] component budget lines with spend/limit/percentage
node(name, max_usd)¶
Register an explicit USD cap for a named LangGraph node. Returns self for chaining.
The cap is enforced by LangGraphAdapter — NodeBudgetExceededError is raised before the node body executes when the cap is reached. Spend is attributed to ComponentBudget._spent and visible in budget.tree().
with budget(max_usd=10.00) as b:
b.node("fetch_data", max_usd=0.50).node("summarize", max_usd=1.00)
graph = StateGraph(State)
graph.add_node("fetch_data", fetch_fn)
graph.add_node("summarize", summarize_fn)
app = graph.compile()
app.invoke(state)
Parameters: - name — node name (must match the name passed to StateGraph.add_node()) - max_usd — USD cap; must be positive
Raises: ValueError if max_usd <= 0
agent(name, max_usd)¶
Register an explicit USD cap for a named CrewAI agent. Returns self for chaining.
Enforced by CrewAIExecutionAdapter — AgentBudgetExceededError is raised before Agent.execute_task runs when the cap is exhausted. Use agent.role as the key to eliminate string mismatch risk. Spend is attributed to ComponentBudget._spent and visible in budget.tree().
with budget(max_usd=10.00) as b:
b.agent(researcher.role, max_usd=2.00).agent(writer.role, max_usd=1.50)
crew.kickoff(inputs={"topic": "AI"})
Parameters: - name — agent name (use agent.role directly) - max_usd — USD cap; must be positive
Raises: ValueError if max_usd <= 0
task(name, max_usd)¶
Register an explicit USD cap for a named CrewAI task. Returns self for chaining.
Enforced by CrewAIExecutionAdapter — TaskBudgetExceededError is raised before Agent.execute_task runs when the cap is exhausted. Use task.name as the key directly. Gate order: task cap is checked before agent cap (most specific first). Spend is attributed independently to both the task and the executing agent.
with budget(max_usd=10.00) as b:
b.task(research_task.name, max_usd=1.50).task(write_task.name, max_usd=0.80)
crew.kickoff(inputs={"topic": "AI"})
Parameters: - name — task name (use task.name directly) - max_usd — USD cap; must be positive
Raises: ValueError if max_usd <= 0
chain(name, max_usd)¶
Register an explicit USD cap for a named LangChain chain. Returns self for chaining.
Enforced by LangChainRunnerAdapter — ChainBudgetExceededError is raised before the chain body executes when the cap is reached. Spend is attributed to ComponentBudget._spent and visible in budget.tree().
with budget(max_usd=10.00) as b:
b.chain("retriever", max_usd=0.20).chain("summarizer", max_usd=1.00)
chain.invoke({"query": "..."})
Parameters: - name — chain name (must match the run_name or object name passed to add_node/invoked directly) - max_usd — USD cap; must be positive
Raises: ValueError if max_usd <= 0
TemporalBudget (rolling-window budgets)¶
Created via the budget() factory when a spec string or window_seconds is provided.
Temporal factory forms¶
# Spec string (per-cap windows)
with budget("$5/hr", name="api") as b: ...
with budget("$5/hr + 100 calls/hr", name="api") as b: ...
# Kwargs (single shared window)
with budget(max_usd=5.0, window_seconds=3600, name="api") as b: ...
with budget(max_usd=5.0, max_llm_calls=100, window_seconds=3600, name="api") as b: ...
name= is required for TemporalBudget.
Supported counters in multi-cap specs¶
| Token | Counter | Example |
|---|---|---|
$N or N usd | usd | $5/hr |
N calls | llm_calls | 100 calls/hr |
N tools | tool_calls | 20 tools/hr |
N tokens | tokens | 50000 tokens/hr |
Using a custom backend¶
from shekel.backends.redis import RedisBackend
backend = RedisBackend(url="redis://localhost:6379/0")
with budget("$5/hr", name="api", backend=backend) as b:
run_agent()
RedisBackend¶
Synchronous Redis-backed rolling-window budget backend for distributed enforcement.
Constructor¶
RedisBackend(
url: str | None = None, # defaults to REDIS_URL env var
tls: bool = False,
on_unavailable: str = "closed", # "closed" | "open"
circuit_breaker_threshold: int = 3,
circuit_breaker_cooldown: float = 10.0,
)
| Parameter | Default | Description |
|---|---|---|
url | REDIS_URL env | Redis connection URL |
tls | False | Force TLS (ssl=True) |
on_unavailable | "closed" | "closed" raises BudgetExceededError; "open" allows through |
circuit_breaker_threshold | 3 | Consecutive errors before circuit opens |
circuit_breaker_cooldown | 10.0 | Seconds before retrying after circuit opens |
Example¶
from shekel import budget
from shekel.backends.redis import RedisBackend
backend = RedisBackend() # reads REDIS_URL from env
with budget("$5/hr + 100 calls/hr", name="api-tier", backend=backend) as b:
run_agent()
Methods¶
check_and_add(budget_name, amounts, limits, windows)— atomically check + increment countersget_state(budget_name)— return{counter: spent}for all countersreset(budget_name)— delete the Redis hash forbudget_nameclose()— close the Redis connection
Raises: BudgetConfigMismatchError if budget_name is already registered with different limits or windows.
Per-Tenant Methods¶
| Method | Returns | Description |
|---|---|---|
get_tenant_spend(name, tenant_id) | float | Current window spend for the tenant. Returns 0.0 if unknown. |
get_tenant_limit(name, tenant_id) | float \| None | Active spend limit for the tenant. Returns None if no limit recorded. |
set_tenant_limit(name, tenant_id, max_usd) | None | Override the tenant's spend limit without resetting accumulated spend. |
reset_tenant(name, tenant_id) | None | Zero out accumulated spend while preserving the limit. |
list_tenants(name) | list[str] | All tenant IDs that have recorded spend for the budget name. |
from shekel.backends.redis import RedisBackend
backend = RedisBackend()
# Inspect a tenant
spent = backend.get_tenant_spend(name="api", tenant_id="user-42")
limit = backend.get_tenant_limit(name="api", tenant_id="user-42")
# Adjust quota
backend.set_tenant_limit(name="api", tenant_id="user-42", max_usd=0.50)
# Reset at billing period rollover
backend.reset_tenant(name="api", tenant_id="user-42")
# Enumerate all tenants
for tid in backend.list_tenants(name="api"):
print(tid, backend.get_tenant_spend(name="api", tenant_id=tid))
See Per-Tenant Budgets for the full guide.
AsyncRedisBackend¶
Async version of RedisBackend. All public methods are coroutines. Suitable for FastAPI, async LangGraph, and other async contexts.
from shekel.backends.redis import AsyncRedisBackend
backend = AsyncRedisBackend()
async with budget("$5/hr", name="api", backend=backend) as b:
await run_async_agent()
Constructor and parameters are identical to RedisBackend.
All five per-tenant methods are available as coroutines:
spent = await backend.get_tenant_spend(name="api", tenant_id="user-42")
limit = await backend.get_tenant_limit(name="api", tenant_id="user-42")
await backend.set_tenant_limit(name="api", tenant_id="user-42", max_usd=0.50)
await backend.reset_tenant(name="api", tenant_id="user-42")
tenants = await backend.list_tenants(name="api")
@with_budget¶
Decorator that wraps functions with a budget context.
Signature¶
def with_budget(
max_usd: float | None = None,
warn_at: float | None = None,
on_warn: Callable[[float, float], None] | None = None,
price_per_1k_tokens: dict[str, float] | None = None,
fallback: dict[str, Any] | None = None,
on_fallback: Callable[[float, float, str], None] | None = None,
max_llm_calls: int | None = None,
)
Parameters¶
Same as budget() (decorator creates fresh budget per call).
Examples¶
Basic Decorator¶
from shekel import with_budget
@with_budget(max_usd=0.50)
def generate_summary(text: str) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return response.choices[0].message.content
Async Decorator¶
@with_budget(max_usd=0.50)
async def async_generate(prompt: str) -> str:
response = await client.chat.completions.create(...)
return response.choices[0].message.content
With All Parameters¶
@with_budget(
max_usd=2.00,
warn_at=0.8,
fallback={"at_pct": 0.8, "model": "gpt-4o-mini"},
on_warn=my_warning_handler
)
def process_request(data: dict) -> str:
...
BudgetExceededError¶
Exception raised when budget limit is exceeded.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
spent | float | Total USD spent when limit was hit. |
limit | float | The configured max_usd. |
model | str | Model that triggered the error. |
tokens | dict[str, int] | Token counts: {"input": N, "output": N}. |
Example¶
from shekel import budget, BudgetExceededError
try:
with budget(max_usd=0.50):
expensive_operation()
except BudgetExceededError as e:
print(f"Spent: ${e.spent:.4f}")
print(f"Limit: ${e.limit:.2f}")
print(f"Model: {e.model}")
print(f"Tokens: {e.tokens['input']} in, {e.tokens['output']} out")
NodeBudgetExceededError¶
Raised when a LangGraph node exceeds its registered USD cap. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
node_name | str | Name of the node that exceeded its budget. |
spent | float | Total USD spent when the cap was hit. |
limit | float | The configured max_usd for this node. |
from shekel import budget, NodeBudgetExceededError, BudgetExceededError
try:
with budget(max_usd=10.00) as b:
b.node("fetch", max_usd=0.10)
run_fetch_node()
except NodeBudgetExceededError as e:
print(f"Node '{e.node_name}' exceeded ${e.limit:.2f}")
except BudgetExceededError:
# catches all budget errors including NodeBudgetExceededError
...
AgentBudgetExceededError¶
Raised when an agent exceeds its registered USD cap. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
agent_name | str | Name of the agent that exceeded its budget. |
spent | float | Total USD spent when the cap was hit. |
limit | float | The configured max_usd for this agent. |
TaskBudgetExceededError¶
Raised when a task exceeds its registered USD cap. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
task_name | str | Name of the task that exceeded its budget. |
spent | float | Total USD spent when the cap was hit. |
limit | float | The configured max_usd for this task. |
SessionBudgetExceededError¶
Raised when an always-on agent session exceeds its rolling-window budget. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
agent_name | str | Name of the agent session that exceeded its budget. |
spent | float | Total USD spent when the cap was hit. |
limit | float | The configured session budget. |
window | float \| None | Rolling window duration in seconds, or None. |
ChainBudgetExceededError¶
Raised when a LangChain chain exceeds its registered USD cap. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
chain_name | str | Name of the chain that exceeded its budget. |
spent | float | Total USD spent when the cap was hit. |
limit | float | The configured max_usd for this chain. |
from shekel import budget, ChainBudgetExceededError, BudgetExceededError
try:
with budget(max_usd=10.00) as b:
b.chain("retriever", max_usd=0.20)
chain.invoke({"query": "..."})
except ChainBudgetExceededError as e:
print(f"Chain '{e.chain_name}' exceeded ${e.limit:.2f}")
AgentLoopError¶
Raised when the loop guard detects that the same tool has been called more than loop_guard_max_calls times within loop_guard_window_seconds. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
tool_name | str | Name of the tool that triggered the loop detection. |
call_count | int | Number of calls to this tool within the window at the time of blocking. |
window_seconds | float | The configured rolling window duration. 0 means all-time cap. |
usd_spent | float | Total USD spent when the loop was detected. |
framework | str | Framework that dispatched the tool: "langchain", "mcp", "crewai", "openai-agents", or "manual". |
from shekel import budget, AgentLoopError, BudgetExceededError
try:
with budget(loop_guard=True, loop_guard_max_calls=5):
run_agent()
except AgentLoopError as e:
print(f"Tool '{e.tool_name}' called {e.call_count}x in {e.window_seconds}s")
except BudgetExceededError:
... # catches all budget errors including AgentLoopError
SpendVelocityExceededError¶
Raised when the measured spend velocity (USD per minute) exceeds the max_velocity threshold. Subclass of BudgetExceededError.
Attributes¶
| Attribute | Type | Description |
|---|---|---|
velocity_per_min | float | Measured spend velocity in USD/min at the time of blocking. Always normalized to per-minute regardless of the spec unit. |
limit_per_min | float | The configured velocity limit in USD/min. |
window_seconds | float | Rolling window over which velocity was measured (seconds). |
usd_spent | float | Total USD spent when blocked. |
elapsed_seconds | float | Seconds elapsed since the budget context opened. |
from shekel import budget, SpendVelocityExceededError, BudgetExceededError
try:
with budget(max_velocity="$0.50/min"):
run_agent()
except SpendVelocityExceededError as e:
print(f"Velocity: ${e.velocity_per_min:.4f}/min (limit: ${e.limit_per_min:.4f}/min)")
print(f"Spent: ${e.usd_spent:.4f} over {e.elapsed_seconds:.1f}s")
except BudgetExceededError:
... # catches all budget errors including SpendVelocityExceededError
BudgetConfigMismatchError¶
Raised by RedisBackend / AsyncRedisBackend when a budget name is already registered with different limits or windows. Subclass of BudgetExceededError.
from shekel.exceptions import BudgetConfigMismatchError
try:
with budget("$5/hr", name="api", backend=backend):
run_agent()
except BudgetConfigMismatchError:
# Budget "api" was previously registered with different caps.
# Call backend.reset("api") to clear existing state.
backend.reset("api")
To resolve: call backend.reset(budget_name) to delete the existing Redis state, then retry.
Type Signatures¶
For type checking with mypy, pyright, etc:
from shekel import budget, with_budget, BudgetExceededError
from typing import Callable
# Budget context manager
b: budget = budget(max_usd=1.00)
# Decorator
@with_budget(max_usd=0.50)
def my_func() -> str:
...
# Callbacks
def warn_callback(spent: float, limit: float) -> None:
...
def fallback_callback(spent: float, limit: float, fallback: str) -> None:
...
Next Steps¶
- Basic Usage - Learn the fundamentals
- Nested Budgets - Hierarchical tracking for multi-stage workflows
- Budget Enforcement - Hard caps and warnings
- Fallback Models - Automatic model switching