async-batch-llm
Process thousands of LLM requests in parallel with automatic retries, rate limiting, and flexible error handling.
Works with any LLM provider (OpenAI, Anthropic, Google, LangChain, or custom) through a simple strategy pattern. Built on asyncio for efficient I/O-bound processing.
Why async-batch-llm?
- ✅ Universal - Works with any LLM provider through a simple strategy interface
- ✅ Reliable - Built-in retry logic, timeout handling, and coordinated rate limiting
- ✅ Fast - Parallel async processing with configurable concurrency
- ✅ Observable - Token tracking, metrics collection, and event hooks
- ✅ Cost-Effective - Shared caching strategies can dramatically reduce repeated prompt costs
- ✅ Type-Safe - Full generic type support with Pydantic validation
Installation
# Basic installation
pip install async-batch-llm
# With PydanticAI support (recommended for structured output)
pip install 'async-batch-llm[pydantic-ai]'
# With Google Gemini support
pip install 'async-batch-llm[gemini]'
# With everything
pip install 'async-batch-llm[all]'
Quick Example
import asyncio
from async_batch_llm import (
ParallelBatchProcessor,
LLMWorkItem,
ProcessorConfig,
PydanticAIStrategy,
)
from pydantic_ai import Agent
from pydantic import BaseModel
class Summary(BaseModel):
title: str
key_points: list[str]
async def main():
# Create agent and wrap in strategy
agent = Agent("gemini-2.5-flash", result_type=Summary)
strategy = PydanticAIStrategy(agent=agent)
# Configure processor
config = ProcessorConfig(max_workers=5, timeout_per_item=30.0)
# Process items with automatic resource cleanup
async with ParallelBatchProcessor[str, Summary, None](config=config) as processor:
# Add work items
for doc in ["Document 1 text...", "Document 2 text..."]:
await processor.add_work(
LLMWorkItem(
item_id=f"doc_{hash(doc)}",
strategy=strategy,
prompt=f"Summarize: {doc}",
)
)
# Process all in parallel
result = await processor.process_all()
print(f"Succeeded: {result.succeeded}/{result.total_items}")
print(f"Tokens used: {result.total_input_tokens + result.total_output_tokens}")
asyncio.run(main())
Next Steps
- Getting Started Guide - Learn the basics
- Examples - See more examples
- API Reference - Full API documentation
- Contributing - Help improve async-batch-llm
License
MIT License - See LICENSE for details.