Performance Guide
aiogzip is designed to be a high-performance, asynchronous alternative to Python's gzip module. This guide details its performance characteristics and provides tips for optimization.
Benchmark Summary
All benchmarks were conducted on standard hardware using Python 3.12+.
Text Operations (Winner: aiogzip)
aiogzip is significantly optimized for text processing, often outperforming the standard gzip module due to efficient buffering and async handling.
| Operation | aiogzip | gzip (sync) | Speedup |
|---|---|---|---|
| Bulk Text Read/Write | ~35 MB/s | ~14 MB/s | 2.5x Faster |
| JSONL Processing | - | - | 1.8x Faster |
| Line Iteration | 1.2M lines/sec | - | - |
Why? aiogzip uses optimized UTF-8 decoding strategies (using codecs.getincrementaldecoder) and manages buffers efficiently to minimize encoding/decoding overhead.
Binary Operations (Tie)
For bulk binary I/O, aiogzip matches the throughput of standard gzip.
| Operation | aiogzip | gzip (sync) | Speedup |
|---|---|---|---|
| Bulk Binary I/O | ~52 MB/s | ~53 MB/s | Equivalent |
| Small Chunks | 1.7M ops/sec | 1.3M ops/sec | 1.3x Faster |
Concurrency (Winner: aiogzip)
When processing multiple files, especially where I/O latency (disk/network) is involved, aiogzip shines by not blocking the event loop.
- Concurrent Processing: 1.5x Faster (simulated I/O latency).
- Allows the main thread to remain responsive (e.g., for a web server) while processing heavy compression tasks.
Optimization Tips
1. Choose the Right Chunk Size
The default chunk_size is 64KB, and there is intentionally no upper bound.
- Increase it (e.g.,
128*1024or1024*1024) for large file throughput if you have memory to spare. - Decrease it if you are memory constrained and processing massive files.
- If you push chunk sizes into the multi-megabyte range, budget the extra memory per open file to avoid accidental OOMs.
# Example: Using a larger chunk size for speed
async with AsyncGzipBinaryFile("large.gz", "rb", chunk_size=1024*1024) as f:
...
2. Use read(-1) Carefully
Reading the entire file into memory (read(-1)) is the fastest way to process data if it fits in RAM. aiogzip optimizes this by reading chunks and joining them at the end.
However, for multi-gigabyte files, always prefer streaming (line-by-line or fixed-size reads) to avoid OOM (Out of Memory) crashes.
3. Text vs. Binary
- If you need text, use
AsyncGzipTextFile(ormode="rt"/"wt"). It handles decoding more efficiently than you can typically do manually in Python loop. - If you just need to move bytes (e.g., upload to S3), use
AsyncGzipBinaryFile.
4. Buffer Management
aiogzip maintains an internal buffer.
- Binary Mode: Uses an efficient offset-pointer strategy to avoid expensive memory copies (
del buffer[:n]) when reading small chunks. - Text Mode: Buffers decoded text to handle split multi-byte characters and split newlines correctly.