mach.io.utils#
General utilities for downloading and caching files with integrity verification.
This module provides: - Efficient file hashing compatible with Python 3.9+ (backport of hashlib.file_digest) - Robust file downloading with progress bars and integrity verification - Smart caching with automatic re-download on corruption - Support for multiple hash algorithms (SHA1, SHA256, MD5, etc.)
Key Functions: - file_digest(): Python 3.11+ compatible file hashing for older Python versions - cached_download(): Download and cache files with integrity verification - verify_file_integrity(): Verify file integrity using various hash algorithms
Optional dependencies: - tqdm: For progress bars during downloads
Functions
|
Download a file and cache it with optional integrity verification. |
|
Download a file from a URL with optional progress bar and integrity verification. |
|
Return a digest object that has been updated with contents of file object. |
|
Verify file integrity using specified hash algorithm. |
- mach.io.utils.cached_download(
- url: str,
- cache_dir: str | Path = PosixPath('/home/runner/.cache/mach'),
- filename: str | Path | None = None,
- timeout: int = 30,
- *,
- overwrite: bool = False,
- expected_size: int | None = None,
- expected_hash: str | None = None,
- digest: None | str | Callable[[], hashlib._Hash] = None,
- show_progress: bool = True,
Download a file and cache it with optional integrity verification.
- Parameters:
url – URL to download
cache_dir – Directory to cache the file in
filename – Name to save the file as (default: derived from URL) if absolute path is provided, it will be used as-is without cache-dir
timeout – Connection timeout in seconds
overwrite – Whether to overwrite existing files
expected_hash – Expected hash value for integrity verification
digest – Hash algorithm to use (default: “sha1”)
expected_size – Expected file size in bytes
show_progress – Whether to show progress bar (requires tqdm)
- Returns:
Path to the cached file
- mach.io.utils.download_file(
- url: str,
- output_path: str | Path,
- timeout: int = 30,
- chunk_size: int = 1048576,
- *,
- overwrite: bool = False,
- expected_hash: str | None = None,
- digest: None | str | Callable[[], hashlib._Hash] = None,
- expected_size: int | None = None,
- show_progress: bool = True,
Download a file from a URL with optional progress bar and integrity verification.
- Parameters:
url – URL to download
output_path – Path where the file will be saved
timeout – Connection timeout in seconds
chunk_size – Size of chunks to download
overwrite – Whether to overwrite existing files
expected_hash – Expected hash value for integrity verification
digest – Hash algorithm to use (default: “sha1”)
expected_size – Expected file size in bytes
show_progress – Whether to show progress bar (requires tqdm)
- Returns:
Path to the downloaded file
- Raises:
RuntimeError – If download fails or integrity check fails
ImportError – If show_progress=True but tqdm is not installed
- mach.io.utils.file_digest( ) hashlib._Hash #
Return a digest object that has been updated with contents of file object.
This is a backport-compatible implementation of hashlib.file_digest() that works with Python 3.9+ and follows the same API as Python 3.11+.
- Parameters:
fileobj – File-like object opened for reading in binary mode
digest – Hash algorithm name as str, hash constructor, or callable that returns hash object
- Returns:
Hash object with file contents
Example
- with open(“file.bin”, “rb”) as f:
hash_obj = file_digest(f, “sha256”) print(hash_obj.hexdigest())