Basic Usage¶

This guide covers the fundamentals of using shekel to enforce LLM API budgets and track spend.

Track-Only Mode¶

Use budget() without max_usd to measure spend without enforcing a hard cap:

from shekel import budget
import openai

client = openai.OpenAI()

with budget() as b:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

print(f"Cost: ${b.spent:.4f}")
print(f"Limit: {b.limit}")  # None in track-only mode

Track-Only Mode

Without max_usd, shekel records spend but never raises BudgetExceededError. Use this to measure baseline costs before setting a hard cap, or in production when you want visibility without the risk of interrupting service.

Accessing Budget Information¶

The budget context manager provides several properties:

with budget(max_usd=1.00) as b:
    run_agent()

# After execution
print(f"Spent: ${b.spent:.4f}")           # Total USD spent
print(f"Remaining: ${b.remaining:.4f}")   # USD remaining (or None)
print(f"Limit: ${b.limit}")               # Configured max_usd (or None)

Multiple API Calls¶

Shekel automatically tracks all API calls within the context:

with budget(max_usd=0.50) as b:
    # Multiple calls accumulate
    response1 = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "First question"}],
    )

    response2 = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Second question"}],
    )

    response3 = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Third question"}],
    )

print(f"Total for 3 calls: ${b.spent:.4f}")

Mixing OpenAI and Anthropic¶

Shekel tracks both OpenAI and Anthropic calls in the same budget:

import openai
import anthropic
from shekel import budget

openai_client = openai.OpenAI()
anthropic_client = anthropic.Anthropic()

with budget(max_usd=1.00) as b:
    # OpenAI call
    openai_response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello from OpenAI"}],
    )

    # Anthropic call
    anthropic_response = anthropic_client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=100,
        messages=[{"role": "user", "content": "Hello from Anthropic"}],
    )

print(f"Combined cost: ${b.spent:.4f}")

Error Handling¶

When a budget is exceeded, shekel raises BudgetExceededError:

from shekel import budget, BudgetExceededError

try:
    with budget(max_usd=0.01) as b:  # Very low limit
        response = client.chat.completions.create(
            model="gpt-4o",  # Expensive model
            messages=[{"role": "user", "content": "Tell me a story"}],
        )
except BudgetExceededError as e:
    print(f"Budget exceeded!")
    print(f"Spent: ${e.spent:.4f}")
    print(f"Limit: ${e.limit:.2f}")
    print(f"Model: {e.model}")
    print(f"Tokens: {e.tokens}")

The exception provides rich information:

Attribute	Description
`e.spent`	Total USD spent when limit was hit
`e.limit`	The configured `max_usd`
`e.model`	Model that triggered the error
`e.tokens`	Token counts `{"input": N, "output": N}`

Nested Contexts¶

Budget contexts are properly isolated — nested contexts don't interfere:

# Outer budget
with budget(max_usd=5.00) as outer:
    response1 = client.chat.completions.create(...)

    # Inner budget (separate tracking)
    with budget(max_usd=1.00) as inner:
        response2 = client.chat.completions.create(...)
        print(f"Inner: ${inner.spent:.4f}")

    response3 = client.chat.completions.create(...)
    print(f"Outer: ${outer.spent:.4f}")

Context Isolation

Each with budget() block creates an independent tracking context using Python's ContextVar. Concurrent agents and nested contexts never interfere with each other.

Custom Model Pricing¶

For models not in shekel's built-in table, provide custom pricing:

# Custom model or unlisted provider
with budget(
    max_usd=1.00,
    price_per_1k_tokens={"input": 0.002, "output": 0.006}
) as b:
    response = client.chat.completions.create(
        model="my-custom-model",
        messages=[{"role": "user", "content": "Hello"}],
    )

print(f"Cost with custom pricing: ${b.spent:.4f}")

Custom Pricing

Custom pricing overrides shekel's built-in table. Use this for:

Private/proprietary models
Fine-tuned models with different pricing
Models not yet in shekel's database
Testing with mock pricing

Versioned Model Names¶

Shekel automatically resolves versioned model names to the correct pricing:

with budget() as b:
    # All of these resolve to "gpt-4o" pricing
    client.chat.completions.create(
        model="gpt-4o",              # Base name
        messages=[{"role": "user", "content": "Hello"}],
    )

    client.chat.completions.create(
        model="gpt-4o-2024-08-06",   # Versioned name
        messages=[{"role": "user", "content": "Hello"}],
    )

    client.chat.completions.create(
        model="gpt-4o-2024-05-13",   # Different version
        messages=[{"role": "user", "content": "Hello"}],
    )

print(f"All tracked under gpt-4o pricing: ${b.spent:.4f}")

Batch Processing¶

Track costs for batch operations with early termination:

from shekel import budget, BudgetExceededError

items = ["apple", "banana", "cherry", "date", "elderberry"]
results = []

try:
    with budget(max_usd=0.10) as b:
        for item in items:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": f"Fact about {item}"}],
                max_tokens=30,
            )
            results.append(response.choices[0].message.content)
except BudgetExceededError:
    print(f"Budget hit after {len(results)}/{len(items)} items")

print(f"Processed {len(results)} items for ${b.spent:.4f}")

Spend Summary¶

Get a detailed breakdown of all calls:

with budget(max_usd=2.00) as b:
    # Make various API calls
    for i in range(10):
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )

# Print formatted summary
print(b.summary())

Output:

┌─ Shekel Budget Summary ────────────────────────────────────┐
│ Total: $0.0045  Limit: $2.00  Calls: 10  Status: OK
├────────────────────────────────────────────────────────────┤
│  #    Model                        Input  Output      Cost
│  ────────────────────────────────────────────────────────
│  1    gpt-4o-mini                    120      30  $0.0000
│  2    gpt-4o-mini                    115      28  $0.0000
│  3    gpt-4o-mini                    118      31  $0.0000
│  ...
├────────────────────────────────────────────────────────────┤
│  gpt-4o-mini: 10 calls  $0.0045
└────────────────────────────────────────────────────────────┘

Or get structured data:

data = b.summary_data()
print(f"Total calls: {data['total_calls']}")
print(f"Total spent: ${data['total_spent']:.4f}")
print(f"Models used: {list(data['by_model'].keys())}")

Next Steps¶

Budget Enforcement - Learn about hard caps and warnings
Fallback Models - Automatic model switching
Accumulating Budgets - Multi-session tracking
Streaming - Budget tracking for streams
API Reference - Complete API documentation