Skip to content

Performance Guide

aiogzip is designed to be a high-performance, asynchronous alternative to Python's gzip module. This guide details its performance characteristics and provides tips for optimization.

Benchmark Summary

All benchmarks were conducted on standard hardware using Python 3.12+.

Text Operations (Winner: aiogzip)

aiogzip is significantly optimized for text processing, often outperforming the standard gzip module due to efficient buffering and async handling.

Operation aiogzip gzip (sync) Speedup
Bulk Text Read/Write ~35 MB/s ~14 MB/s 2.5x Faster
JSONL Processing - - 1.8x Faster
Line Iteration 1.2M lines/sec - -

Why? aiogzip uses optimized UTF-8 decoding strategies (using codecs.getincrementaldecoder) and manages buffers efficiently to minimize encoding/decoding overhead.

Binary Operations (Tie)

For bulk binary I/O, aiogzip matches the throughput of standard gzip.

Operation aiogzip gzip (sync) Speedup
Bulk Binary I/O ~52 MB/s ~53 MB/s Equivalent
Small Chunks 1.7M ops/sec 1.3M ops/sec 1.3x Faster

Concurrency (Winner: aiogzip)

When processing multiple files, especially where I/O latency (disk/network) is involved, aiogzip shines by not blocking the event loop.

  • Concurrent Processing: 1.5x Faster (simulated I/O latency).
  • Allows the main thread to remain responsive (e.g., for a web server) while processing heavy compression tasks.

Optimization Tips

1. Choose the Right Chunk Size

The default chunk_size is 64KB, and there is intentionally no upper bound.

  • Increase it (e.g., 128*1024 or 1024*1024) for large file throughput if you have memory to spare.
  • Decrease it if you are memory constrained and processing massive files.
  • If you push chunk sizes into the multi-megabyte range, budget the extra memory per open file to avoid accidental OOMs.
# Example: Using a larger chunk size for speed
async with AsyncGzipBinaryFile("large.gz", "rb", chunk_size=1024*1024) as f:
    ...

2. Use read(-1) Carefully

Reading the entire file into memory (read(-1)) is the fastest way to process data if it fits in RAM. aiogzip optimizes this by reading chunks and joining them at the end.

However, for multi-gigabyte files, always prefer streaming (line-by-line or fixed-size reads) to avoid OOM (Out of Memory) crashes.

3. Text vs. Binary

  • If you need text, use AsyncGzipTextFile (or mode="rt"/"wt"). It handles decoding more efficiently than you can typically do manually in Python loop.
  • If you just need to move bytes (e.g., upload to S3), use AsyncGzipBinaryFile.

4. Buffer Management

aiogzip maintains an internal buffer.

  • Binary Mode: Uses an efficient offset-pointer strategy to avoid expensive memory copies (del buffer[:n]) when reading small chunks.
  • Text Mode: Buffers decoded text to handle split multi-byte characters and split newlines correctly.