Spend Velocity¶

Cap how fast you burn money, not just how much. One parameter.

A max_usd=50.00 cap won't save you if your agent blows $40 in the first two minutes. Spend velocity adds a burn-rate circuit breaker alongside the total cap.

The problem¶

Total spend caps and velocity caps protect against different failure modes:

Failure mode	Protected by
Agent runs for 8 hours, accumulates $48	`max_usd` cap
Agent bursts $40 in 2 minutes, then idles	Velocity cap
Agent loops on a cheap tool, slow drain	Loop guard

A misconfigured agent with a fast LLM can spend at $10/min or faster. By the time a max_usd=50 cap fires, $50 is already gone. A velocity cap of "$1/min" stops it after the first dollar.

Quick Start¶

from shekel import budget, SpendVelocityExceededError

try:
    with budget(max_velocity="$0.50/min") as b:
        run_my_agent()
except SpendVelocityExceededError as e:
    print(f"Burn rate too high: ${e.velocity_per_min:.4f}/min (limit: ${e.limit_per_min:.4f}/min)")

No changes to your agent code. Works with all auto-patched providers: OpenAI, Anthropic, Gemini, LiteLLM.

Spec format¶

The velocity spec uses the same string DSL as temporal budgets:

$<amount>/<count><unit>

Example	Meaning
`"$0.50/min"`	$0.50 per minute
`"$5/hr"`	$5 per hour
`"$0.01/sec"`	$0.01 per second
`"$100/day"`	$100 per day

All specs are internally normalized to USD per minute for comparison and error reporting. $5/hr becomes $0.0833/min; $0.01/sec becomes $0.60/min.

Supported time units: sec, s, min, m, hr, h, hour, day, d.

Velocity-only guard (no `max_usd`)¶

You can use velocity without a total cap — useful when you want to throttle spend rate but don't have a hard total budget:

# Allow at most $1 per minute — no total ceiling
with budget(max_velocity="$1/min") as b:
    run_streaming_agent()

print(f"Spent: ${b.spent:.4f}")

This is useful for API tiers, rate-limiting proxies, or services where total spend is metered externally but you want a local burst guard.

Compound guardrails¶

Combine max_usd and max_velocity to protect against both slow drain and fast burn:

try:
    with budget(
        max_usd=50.00,            # never spend more than $50 total
        max_velocity="$1/min",   # never burn faster than $1/min
    ) as b:
        run_agent()
except SpendVelocityExceededError as e:
    print(f"Velocity exceeded: ${e.velocity_per_min:.4f}/min")
except BudgetExceededError as e:
    print(f"Total cap hit: ${e.spent:.4f}")

Both guards are checked on every LLM call. Whichever fires first wins.

Velocity warning¶

Add warn_velocity to get a soft warning before the hard stop. It must be less than max_velocity:

def on_velocity_warn(current_rate, limit_rate):
    print(f"Velocity warning: ${current_rate:.4f}/min (limit: ${limit_rate:.4f}/min)")

with budget(
    max_velocity="$1/min",
    warn_velocity="$0.75/min",    # warn at 75% of the velocity limit
    on_warn=on_velocity_warn,
) as b:
    run_agent()

The warn_velocity callback fires once per budget context when the threshold is crossed. The hard stop at max_velocity fires independently when the limit is actually exceeded.

`warn_only` mode¶

Use warn_only=True in staging to observe velocity patterns without blocking:

with budget(max_velocity="$0.50/min", warn_only=True) as b:
    run_agent()
# Never raises — logs a warning when velocity exceeds $0.50/min

print(f"Peak velocity observed: see logs")

Use this to calibrate max_velocity before enabling enforcement in production.

Reading the exception¶

SpendVelocityExceededError carries the measured velocity and the configured limit:

from shekel import budget, SpendVelocityExceededError

try:
    with budget(max_velocity="$0.50/min") as b:
        run_agent()
except SpendVelocityExceededError as e:
    print(f"Velocity:       ${e.velocity_per_min:.4f}/min")
    print(f"Limit:          ${e.limit_per_min:.4f}/min")
    print(f"Window (s):     {e.window_seconds:.1f}")
    print(f"USD spent:      ${e.usd_spent:.4f}")
    print(f"Elapsed (s):    {e.elapsed_seconds:.1f}")

All velocity values are normalized to per minute regardless of the spec unit you used. e.velocity_per_min is always in USD/min.

Attribute	Type	Description
`velocity_per_min`	`float`	Measured spend velocity in USD/min at the time of blocking
`limit_per_min`	`float`	The configured velocity limit in USD/min
`window_seconds`	`float`	Rolling window over which velocity was measured
`usd_spent`	`float`	Total USD spent when blocked
`elapsed_seconds`	`float`	Seconds elapsed since the budget context opened

SpendVelocityExceededError subclasses BudgetExceededError, so existing except BudgetExceededError blocks catch it automatically.

Nested budgets¶

Velocity is measured independently per budget context. A child context has its own velocity clock:

with budget(max_usd=20.00, max_velocity="$2/min", name="outer") as outer:
    with budget(max_usd=5.00, max_velocity="$0.50/min", name="inner") as inner:
        run_fast_stage()  # inner velocity cap fires if inner burns > $0.50/min
    # inner spend rolls up to outer automatically

The inner's velocity cap is tighter — it fires if the inner stage alone burns faster than $0.50/min, regardless of the outer budget's velocity. Both are enforced independently.

What it doesn't cover¶

TemporalBudget interaction — Spend velocity measures the rate within a single budget() context (from __enter__ to __exit__). It does not interact with TemporalBudget rolling-window counters. For long-running services where you want per-hour or per-day rate limits, use budget("$5/hr", name="api") instead.

Distributed velocity — max_velocity is local to a single process. If you have multiple processes spending concurrently, their individual velocities are independent. For distributed velocity enforcement, use RedisBackend with a TemporalBudget.

Next Steps¶

Loop Guard — detect repeated tool calls
Temporal Budgets — rolling-window rate limits for multi-process deployments
API Reference