Skip to content

API Reference

Complete reference for all shekel APIs.

budget()

Context manager for tracking and enforcing LLM API budgets.

Signature

def budget(
    max_usd: float | None = None,
    warn_at: float | None = None,
    on_warn: Callable[[float, float], None] | None = None,
    price_per_1k_tokens: dict[str, float] | None = None,
    fallback: dict[str, Any] | None = None,
    on_fallback: Callable[[float, float, str], None] | None = None,
    name: str | None = None,
    max_llm_calls: int | None = None,
) -> Budget

Parameters

Parameter Type Default Description
max_usd float \| None None Maximum spend in USD. None = track-only mode (no enforcement). Must be positive if set.
warn_at float \| None None Fraction (0.0-1.0) of max_usd at which to warn.
on_warn Callable[[float, float], None] \| None None Callback fired at warn_at threshold. Receives (spent, limit).
price_per_1k_tokens dict[str, float] \| None None Override pricing: {"input": X, "output": Y} per 1k tokens.
fallback dict[str, Any] \| None None Dict specifying when and what model to switch to: {"at_pct": 0.8, "model": "gpt-4o-mini"}. at_pct is the fraction of max_usd at which to switch; model is the fallback model (same provider only). Fallback shares the same max_usd budget — there is no separate ceiling.
on_fallback Callable[[float, float, str], None] \| None None Callback on fallback switch. Receives (spent, limit, fallback_model).
name str \| None None Budget name for debugging and cost attribution. Required when nesting budgets.
max_llm_calls int \| None None Maximum number of LLM API calls. Raises BudgetExceededError when exceeded. Can be combined with max_usd.
loop_guard bool False Enable per-tool rolling-window loop detection. Raises AgentLoopError when the same tool is called more than loop_guard_max_calls times within loop_guard_window_seconds.
loop_guard_max_calls int 5 Max calls to the same tool within the window before AgentLoopError. Only applies when loop_guard=True.
loop_guard_window_seconds float 60.0 Rolling window duration in seconds. 0 = all-time cap (no rolling window). Only applies when loop_guard=True.
max_velocity str \| None None Spend velocity cap. Format: "$<amount>/<unit>" (e.g. "$0.50/min", "$5/hr"). Raises SpendVelocityExceededError when the burn rate exceeds this threshold.
warn_velocity str \| None None Soft velocity warning threshold. Same format as max_velocity. Must be less than max_velocity. Fires on_warn callback when crossed; does not raise.
tenant_id str \| None None Tenant or user identifier for per-tenant spend isolation. When set, Redis state is namespaced under shekel:tb:{name}:{tenant_id}. Requires name and backend. Empty string raises ValueError.
backend RedisBackend \| AsyncRedisBackend \| None None Redis backend for distributed or per-tenant enforcement. Required when tenant_id is set.
window_seconds float \| None None Rolling-window duration in seconds. Required (or inferred from a spec string) for temporal budgets. Default when tenant_id is set: 86400 * 30 (30 days).

Returns

Budget object that can be used as a context manager.

Examples

Track-Only Mode

with budget() as b:
    run_agent()
print(f"Cost: ${b.spent:.4f}")

Budget Enforcement

with budget(max_usd=1.00) as b:
    run_agent()

Early Warning

with budget(max_usd=5.00, warn_at=0.8) as b:
    run_agent()  # Warns at $4.00

Custom Warning Callback

def my_handler(spent: float, limit: float):
    print(f"Alert: ${spent:.2f} / ${limit:.2f}")

with budget(max_usd=10.00, warn_at=0.8, on_warn=my_handler):
    run_agent()

Model Fallback

with budget(max_usd=1.00, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"}) as b:
    run_agent()

Call-Count Budget

with budget(max_llm_calls=50) as b:
    run_agent()  # Raises BudgetExceededError after 50 LLM calls

Combined USD and Call-Count Budget

with budget(max_usd=1.00, max_llm_calls=20, fallback={"at_pct": 0.8, "model": "gpt-4o-mini"}) as b:
    run_agent()

Accumulating Budget

# Budget variables accumulate across uses
session = budget(max_usd=10.00, name="session")

with session:
    process_batch_1()

with session:
    process_batch_2()  # Accumulates automatically

print(f"Total: ${session.spent:.2f}")

Custom Pricing

with budget(
    max_usd=1.00,
    price_per_1k_tokens={"input": 0.002, "output": 0.006}
):
    run_agent()

Nested Budgets

with budget(max_usd=10.00, name="workflow") as workflow:
    # Research stage: $2 budget
    with budget(max_usd=2.00, name="research"):
        run_research()

    # Analysis stage: $5 budget
    with budget(max_usd=5.00, name="analysis"):
        run_analysis()

    # Parent can spend too
    finalize()

print(f"Total: ${workflow.spent:.2f}")
print(workflow.tree())

See full nested budgets guide →


Budget Class

The budget context manager object.

Properties

Property Type Description
spent float Total USD spent in this budget context (includes children in nested budgets).
remaining float \| None Remaining USD budget (based on effective limit), or None if track-only mode.
limit float \| None Effective budget limit (auto-capped if nested), or None if track-only.
name str \| None Budget name.
model_switched bool True if fallback was activated.
switched_at_usd float \| None USD spent when fallback occurred, or None.
fallback_spent float USD spent on the fallback model.
loop_guard_counts dict[str, int] Per-tool call counts recorded by the loop guard. Empty dict when loop_guard=False. Keys are tool names; values are total calls recorded within the current window.
tenant_id str \| None Tenant identifier passed to budget(), or None if not set.

Nested Budget Properties

Property Type Description
parent Budget \| None Parent budget, or None if root budget.
children list[Budget] List of child budgets created under this budget.
active_child Budget \| None Currently active child budget, or None.
full_name str Hierarchical path name (e.g., "workflow.research.validation").
spent_direct float Direct spend by this budget only (excluding children).
spent_by_children float Sum of spend from all child budgets.

Methods

reset()

Reset spend tracking to zero. Only works when budget is not active.

session = budget(max_usd=10.00, name="session")

with session:
    process()

session.reset()  # Back to $0

with session:
    process_again()

Raises: RuntimeError if called inside an active with block.

summary()

Return formatted spend summary as a string.

with budget(max_usd=5.00) as b:
    run_agent()

print(b.summary())

Returns: Multi-line string with formatted table of calls, costs, and totals.

summary_data()

Return structured spend data as a dict.

with budget() as b:
    run_agent()

data = b.summary_data()
print(data["total_spent"])
print(data["total_calls"])
print(data["by_model"])

Returns: Dictionary with keys: - total_spent: Total USD - limit: Budget limit - model_switched: Boolean - switched_at_usd: Switch point - fallback_model: Fallback model name - fallback_spent: Cost on fallback - total_calls: Number of API calls - calls: List of all call records - by_model: Aggregated stats per model

tree()

Return visual hierarchy of budget tree with spend breakdown.

with budget(max_usd=20, name="workflow") as w:
    with budget(max_usd=5, name="research"):
        research()
    with budget(max_usd=10, name="analysis"):
        analyze()

print(w.tree())
# workflow: $12.50 / $20.00 (direct: $0.00)
#   research: $3.20 / $5.00 (direct: $3.20)
#   analysis: $9.30 / $10.00 (direct: $9.30)

Also renders registered component budgets (nodes, agents, tasks). LangGraph node spend is tracked automatically; agent/task spend requires future framework adapters.

with budget(max_usd=10, name="workflow") as b:
    b.node("fetch", max_usd=0.50)
    b.node("summarize", max_usd=1.00)
    run_langgraph_workflow()

print(b.tree())
# workflow: $0.84 / $10.00 (direct: $0.00)
#   [node] fetch: $0.12 / $0.50 (24.0%)
#   [node] summarize: $0.72 / $1.00 (72.0%)

Returns: Multi-line string with indented tree structure showing: - Budget name and hierarchy - Total spend / limit - Direct spend (excluding children) - [ACTIVE] marker for currently active children - [node], [agent], [task] component budget lines with spend/limit/percentage

node(name, max_usd)

Register an explicit USD cap for a named LangGraph node. Returns self for chaining.

The cap is enforced by LangGraphAdapterNodeBudgetExceededError is raised before the node body executes when the cap is reached. Spend is attributed to ComponentBudget._spent and visible in budget.tree().

with budget(max_usd=10.00) as b:
    b.node("fetch_data", max_usd=0.50).node("summarize", max_usd=1.00)

    graph = StateGraph(State)
    graph.add_node("fetch_data", fetch_fn)
    graph.add_node("summarize", summarize_fn)
    app = graph.compile()
    app.invoke(state)

Parameters: - name — node name (must match the name passed to StateGraph.add_node()) - max_usd — USD cap; must be positive

Raises: ValueError if max_usd <= 0

agent(name, max_usd)

Register an explicit USD cap for a named CrewAI agent. Returns self for chaining.

Enforced by CrewAIExecutionAdapterAgentBudgetExceededError is raised before Agent.execute_task runs when the cap is exhausted. Use agent.role as the key to eliminate string mismatch risk. Spend is attributed to ComponentBudget._spent and visible in budget.tree().

with budget(max_usd=10.00) as b:
    b.agent(researcher.role, max_usd=2.00).agent(writer.role, max_usd=1.50)
    crew.kickoff(inputs={"topic": "AI"})

Parameters: - name — agent name (use agent.role directly) - max_usd — USD cap; must be positive

Raises: ValueError if max_usd <= 0

task(name, max_usd)

Register an explicit USD cap for a named CrewAI task. Returns self for chaining.

Enforced by CrewAIExecutionAdapterTaskBudgetExceededError is raised before Agent.execute_task runs when the cap is exhausted. Use task.name as the key directly. Gate order: task cap is checked before agent cap (most specific first). Spend is attributed independently to both the task and the executing agent.

with budget(max_usd=10.00) as b:
    b.task(research_task.name, max_usd=1.50).task(write_task.name, max_usd=0.80)
    crew.kickoff(inputs={"topic": "AI"})

Parameters: - name — task name (use task.name directly) - max_usd — USD cap; must be positive

Raises: ValueError if max_usd <= 0

chain(name, max_usd)

Register an explicit USD cap for a named LangChain chain. Returns self for chaining.

Enforced by LangChainRunnerAdapterChainBudgetExceededError is raised before the chain body executes when the cap is reached. Spend is attributed to ComponentBudget._spent and visible in budget.tree().

with budget(max_usd=10.00) as b:
    b.chain("retriever", max_usd=0.20).chain("summarizer", max_usd=1.00)
    chain.invoke({"query": "..."})

Parameters: - name — chain name (must match the run_name or object name passed to add_node/invoked directly) - max_usd — USD cap; must be positive

Raises: ValueError if max_usd <= 0


TemporalBudget (rolling-window budgets)

Created via the budget() factory when a spec string or window_seconds is provided.

Temporal factory forms

# Spec string (per-cap windows)
with budget("$5/hr", name="api") as b: ...
with budget("$5/hr + 100 calls/hr", name="api") as b: ...

# Kwargs (single shared window)
with budget(max_usd=5.0, window_seconds=3600, name="api") as b: ...
with budget(max_usd=5.0, max_llm_calls=100, window_seconds=3600, name="api") as b: ...

name= is required for TemporalBudget.

Supported counters in multi-cap specs

Token Counter Example
$N or N usd usd $5/hr
N calls llm_calls 100 calls/hr
N tools tool_calls 20 tools/hr
N tokens tokens 50000 tokens/hr

Using a custom backend

from shekel.backends.redis import RedisBackend

backend = RedisBackend(url="redis://localhost:6379/0")

with budget("$5/hr", name="api", backend=backend) as b:
    run_agent()

RedisBackend

Synchronous Redis-backed rolling-window budget backend for distributed enforcement.

Constructor

RedisBackend(
    url: str | None = None,          # defaults to REDIS_URL env var
    tls: bool = False,
    on_unavailable: str = "closed",  # "closed" | "open"
    circuit_breaker_threshold: int = 3,
    circuit_breaker_cooldown: float = 10.0,
)
Parameter Default Description
url REDIS_URL env Redis connection URL
tls False Force TLS (ssl=True)
on_unavailable "closed" "closed" raises BudgetExceededError; "open" allows through
circuit_breaker_threshold 3 Consecutive errors before circuit opens
circuit_breaker_cooldown 10.0 Seconds before retrying after circuit opens

Example

from shekel import budget
from shekel.backends.redis import RedisBackend

backend = RedisBackend()  # reads REDIS_URL from env

with budget("$5/hr + 100 calls/hr", name="api-tier", backend=backend) as b:
    run_agent()

Methods

  • check_and_add(budget_name, amounts, limits, windows) — atomically check + increment counters
  • get_state(budget_name) — return {counter: spent} for all counters
  • reset(budget_name) — delete the Redis hash for budget_name
  • close() — close the Redis connection

Raises: BudgetConfigMismatchError if budget_name is already registered with different limits or windows.

Per-Tenant Methods

Method Returns Description
get_tenant_spend(name, tenant_id) float Current window spend for the tenant. Returns 0.0 if unknown.
get_tenant_limit(name, tenant_id) float \| None Active spend limit for the tenant. Returns None if no limit recorded.
set_tenant_limit(name, tenant_id, max_usd) None Override the tenant's spend limit without resetting accumulated spend.
reset_tenant(name, tenant_id) None Zero out accumulated spend while preserving the limit.
list_tenants(name) list[str] All tenant IDs that have recorded spend for the budget name.
from shekel.backends.redis import RedisBackend

backend = RedisBackend()

# Inspect a tenant
spent = backend.get_tenant_spend(name="api", tenant_id="user-42")
limit = backend.get_tenant_limit(name="api", tenant_id="user-42")

# Adjust quota
backend.set_tenant_limit(name="api", tenant_id="user-42", max_usd=0.50)

# Reset at billing period rollover
backend.reset_tenant(name="api", tenant_id="user-42")

# Enumerate all tenants
for tid in backend.list_tenants(name="api"):
    print(tid, backend.get_tenant_spend(name="api", tenant_id=tid))

See Per-Tenant Budgets for the full guide.


AsyncRedisBackend

Async version of RedisBackend. All public methods are coroutines. Suitable for FastAPI, async LangGraph, and other async contexts.

from shekel.backends.redis import AsyncRedisBackend

backend = AsyncRedisBackend()

async with budget("$5/hr", name="api", backend=backend) as b:
    await run_async_agent()

Constructor and parameters are identical to RedisBackend.

All five per-tenant methods are available as coroutines:

spent  = await backend.get_tenant_spend(name="api", tenant_id="user-42")
limit  = await backend.get_tenant_limit(name="api", tenant_id="user-42")
await backend.set_tenant_limit(name="api", tenant_id="user-42", max_usd=0.50)
await backend.reset_tenant(name="api", tenant_id="user-42")
tenants = await backend.list_tenants(name="api")

@with_budget

Decorator that wraps functions with a budget context.

Signature

def with_budget(
    max_usd: float | None = None,
    warn_at: float | None = None,
    on_warn: Callable[[float, float], None] | None = None,
    price_per_1k_tokens: dict[str, float] | None = None,
    fallback: dict[str, Any] | None = None,
    on_fallback: Callable[[float, float, str], None] | None = None,
    max_llm_calls: int | None = None,
)

Parameters

Same as budget() (decorator creates fresh budget per call).

Examples

Basic Decorator

from shekel import with_budget

@with_budget(max_usd=0.50)
def generate_summary(text: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return response.choices[0].message.content

Async Decorator

@with_budget(max_usd=0.50)
async def async_generate(prompt: str) -> str:
    response = await client.chat.completions.create(...)
    return response.choices[0].message.content

With All Parameters

@with_budget(
    max_usd=2.00,
    warn_at=0.8,
    fallback={"at_pct": 0.8, "model": "gpt-4o-mini"},
    on_warn=my_warning_handler
)
def process_request(data: dict) -> str:
    ...

BudgetExceededError

Exception raised when budget limit is exceeded.

Attributes

Attribute Type Description
spent float Total USD spent when limit was hit.
limit float The configured max_usd.
model str Model that triggered the error.
tokens dict[str, int] Token counts: {"input": N, "output": N}.

Example

from shekel import budget, BudgetExceededError

try:
    with budget(max_usd=0.50):
        expensive_operation()
except BudgetExceededError as e:
    print(f"Spent: ${e.spent:.4f}")
    print(f"Limit: ${e.limit:.2f}")
    print(f"Model: {e.model}")
    print(f"Tokens: {e.tokens['input']} in, {e.tokens['output']} out")

NodeBudgetExceededError

Raised when a LangGraph node exceeds its registered USD cap. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
node_name str Name of the node that exceeded its budget.
spent float Total USD spent when the cap was hit.
limit float The configured max_usd for this node.
from shekel import budget, NodeBudgetExceededError, BudgetExceededError

try:
    with budget(max_usd=10.00) as b:
        b.node("fetch", max_usd=0.10)
        run_fetch_node()
except NodeBudgetExceededError as e:
    print(f"Node '{e.node_name}' exceeded ${e.limit:.2f}")
except BudgetExceededError:
    # catches all budget errors including NodeBudgetExceededError
    ...

AgentBudgetExceededError

Raised when an agent exceeds its registered USD cap. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
agent_name str Name of the agent that exceeded its budget.
spent float Total USD spent when the cap was hit.
limit float The configured max_usd for this agent.

TaskBudgetExceededError

Raised when a task exceeds its registered USD cap. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
task_name str Name of the task that exceeded its budget.
spent float Total USD spent when the cap was hit.
limit float The configured max_usd for this task.

SessionBudgetExceededError

Raised when an always-on agent session exceeds its rolling-window budget. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
agent_name str Name of the agent session that exceeded its budget.
spent float Total USD spent when the cap was hit.
limit float The configured session budget.
window float \| None Rolling window duration in seconds, or None.

ChainBudgetExceededError

Raised when a LangChain chain exceeds its registered USD cap. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
chain_name str Name of the chain that exceeded its budget.
spent float Total USD spent when the cap was hit.
limit float The configured max_usd for this chain.
from shekel import budget, ChainBudgetExceededError, BudgetExceededError

try:
    with budget(max_usd=10.00) as b:
        b.chain("retriever", max_usd=0.20)
        chain.invoke({"query": "..."})
except ChainBudgetExceededError as e:
    print(f"Chain '{e.chain_name}' exceeded ${e.limit:.2f}")

AgentLoopError

Raised when the loop guard detects that the same tool has been called more than loop_guard_max_calls times within loop_guard_window_seconds. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
tool_name str Name of the tool that triggered the loop detection.
call_count int Number of calls to this tool within the window at the time of blocking.
window_seconds float The configured rolling window duration. 0 means all-time cap.
usd_spent float Total USD spent when the loop was detected.
framework str Framework that dispatched the tool: "langchain", "mcp", "crewai", "openai-agents", or "manual".
from shekel import budget, AgentLoopError, BudgetExceededError

try:
    with budget(loop_guard=True, loop_guard_max_calls=5):
        run_agent()
except AgentLoopError as e:
    print(f"Tool '{e.tool_name}' called {e.call_count}x in {e.window_seconds}s")
except BudgetExceededError:
    ...  # catches all budget errors including AgentLoopError

SpendVelocityExceededError

Raised when the measured spend velocity (USD per minute) exceeds the max_velocity threshold. Subclass of BudgetExceededError.

Attributes

Attribute Type Description
velocity_per_min float Measured spend velocity in USD/min at the time of blocking. Always normalized to per-minute regardless of the spec unit.
limit_per_min float The configured velocity limit in USD/min.
window_seconds float Rolling window over which velocity was measured (seconds).
usd_spent float Total USD spent when blocked.
elapsed_seconds float Seconds elapsed since the budget context opened.
from shekel import budget, SpendVelocityExceededError, BudgetExceededError

try:
    with budget(max_velocity="$0.50/min"):
        run_agent()
except SpendVelocityExceededError as e:
    print(f"Velocity: ${e.velocity_per_min:.4f}/min (limit: ${e.limit_per_min:.4f}/min)")
    print(f"Spent:    ${e.usd_spent:.4f} over {e.elapsed_seconds:.1f}s")
except BudgetExceededError:
    ...  # catches all budget errors including SpendVelocityExceededError

BudgetConfigMismatchError

Raised by RedisBackend / AsyncRedisBackend when a budget name is already registered with different limits or windows. Subclass of BudgetExceededError.

from shekel.exceptions import BudgetConfigMismatchError

try:
    with budget("$5/hr", name="api", backend=backend):
        run_agent()
except BudgetConfigMismatchError:
    # Budget "api" was previously registered with different caps.
    # Call backend.reset("api") to clear existing state.
    backend.reset("api")

To resolve: call backend.reset(budget_name) to delete the existing Redis state, then retry.


Type Signatures

For type checking with mypy, pyright, etc:

from shekel import budget, with_budget, BudgetExceededError
from typing import Callable

# Budget context manager
b: budget = budget(max_usd=1.00)

# Decorator
@with_budget(max_usd=0.50)
def my_func() -> str:
    ...

# Callbacks
def warn_callback(spent: float, limit: float) -> None:
    ...

def fallback_callback(spent: float, limit: float, fallback: str) -> None:
    ...

Next Steps