aiogzip ⚡️
An asynchronous library for reading and writing gzip-compressed files.
aiogzip provides a fast, simple, and asyncio-native interface for handling .gz files, making it a useful complement to Python's built-in gzip module for asynchronous applications.
It is designed for high-performance I/O operations, especially for text-based data pipelines, and integrates seamlessly with other async libraries like aiocsv.
Features
- Truly Asynchronous: Built with
asyncioandaiofilesfor non-blocking file I/O. - High-Performance Text Processing: Significantly faster than the standard
gziplibrary for text and JSONL file operations. - Simple API: Mimics the interface of
gzip.open(), making it easy to adopt. - Separate Binary and Text Modes:
AsyncGzipBinaryFileandAsyncGzipTextFileprovide clear, type-safe handling of data. - Excellent Compression Quality: Achieves compression ratios nearly identical to the standard
gzipmodule. aiocsvIntegration: Read and write compressed CSV files effortlessly.
Quick Links
Quickstart
Using aiogzip is as simple as using the standard gzip module, but with async/await.
Writing to a Compressed File
import asyncio
from aiogzip import AsyncGzipFile
async def main():
# Write binary data
async with AsyncGzipFile("file.gz", "wb") as f:
await f.write(b"Hello, async world!")
# Write text data
async with AsyncGzipFile("file.txt.gz", "wt") as f:
await f.write("This is a text file.")
asyncio.run(main())
Reading from a Compressed File
import asyncio
from aiogzip import AsyncGzipFile
async def main():
# Read the entire file
async with AsyncGzipFile("file.gz", "rb") as f:
content = await f.read()
print(content)
# Iterate over lines in a text file
async with AsyncGzipFile("file.txt.gz", "rt") as f:
async for line in f:
print(line.strip())
asyncio.run(main())
Compatibility
aiogzip provides comprehensive compatibility with the standard gzip module's GzipFile API, including:
- ✅
seek()andtell()methods for stream navigation (with the same performance characteristics asgzip.GzipFile) - ✅
peek()andreadinto()for advanced reading patterns - ✅ Reading and writing gzip headers and metadata (e.g.,
mtime,original_filename) - ✅ Text and binary mode operations with proper encoding/decoding
- ✅ Full compatibility with
tarfilefor reading.tar.gzarchives - ✅ Seamless integration with
aiocsvfor CSV processing
For AsyncGzipTextFile, tell() returns an opaque cookie value for the current open stream. Use it only with seek(cookie) on the same open handle.
Backward seeks restart decompression from the beginning of the gzip stream. For non-seekable fileobj inputs, aiogzip keeps a bounded compressed-input replay cache so rewind can work without loading unbounded data; tune it with max_rewind_cache_size or set it to None for the previous unbounded behavior.
Note: aiogzip focuses on file-based operations and does not currently support in-memory compression/decompression (e.g., gzip.compress/gzip.decompress).