mach.io.utils#
General utilities for downloading and caching files with integrity verification.
This module provides: - Efficient file hashing compatible with Python 3.9+ (backport of hashlib.file_digest) - Robust file downloading with progress bars and integrity verification - Smart caching with automatic re-download on corruption - Support for multiple hash algorithms (SHA1, SHA256, MD5, etc.)
Key Functions: - file_digest(): Python 3.11+ compatible file hashing for older Python versions - cached_download(): Download and cache files with integrity verification - verify_file_integrity(): Verify file integrity using various hash algorithms
Optional dependencies: - tqdm: For progress bars during downloads
Functions
| 
 | Download a file and cache it with optional integrity verification. | 
| 
 | Download a file from a URL with optional progress bar and integrity verification. | 
| 
 | Return a digest object that has been updated with contents of file object. | 
| 
 | Verify file integrity using specified hash algorithm. | 
- mach.io.utils.cached_download(
- url: str,
- cache_dir: str | Path = PosixPath('/home/runner/.cache/mach'),
- filename: str | Path | None = None,
- timeout: int = 30,
- *,
- overwrite: bool = False,
- expected_size: int | None = None,
- expected_hash: str | None = None,
- digest: None | str | Callable[[], hashlib._Hash] = None,
- show_progress: bool = True,
- Download a file and cache it with optional integrity verification. - Parameters:
- url – URL to download 
- cache_dir – Directory to cache the file in 
- filename – Name to save the file as (default: derived from URL) if absolute path is provided, it will be used as-is without cache-dir 
- timeout – Connection timeout in seconds 
- overwrite – Whether to overwrite existing files 
- expected_hash – Expected hash value for integrity verification 
- digest – Hash algorithm to use (default: “sha1”) 
- expected_size – Expected file size in bytes 
- show_progress – Whether to show progress bar (requires tqdm) 
 
- Returns:
- Path to the cached file 
 
- mach.io.utils.download_file(
- url: str,
- output_path: str | Path,
- timeout: int = 30,
- chunk_size: int = 1048576,
- *,
- overwrite: bool = False,
- expected_hash: str | None = None,
- digest: None | str | Callable[[], hashlib._Hash] = None,
- expected_size: int | None = None,
- show_progress: bool = True,
- Download a file from a URL with optional progress bar and integrity verification. - Parameters:
- url – URL to download 
- output_path – Path where the file will be saved 
- timeout – Connection timeout in seconds 
- chunk_size – Size of chunks to download 
- overwrite – Whether to overwrite existing files 
- expected_hash – Expected hash value for integrity verification 
- digest – Hash algorithm to use (default: “sha1”) 
- expected_size – Expected file size in bytes 
- show_progress – Whether to show progress bar (requires tqdm) 
 
- Returns:
- Path to the downloaded file 
- Raises:
- RuntimeError – If download fails or integrity check fails 
- ImportError – If show_progress=True but tqdm is not installed 
 
 
- mach.io.utils.file_digest( ) hashlib._Hash#
- Return a digest object that has been updated with contents of file object. - This is a backport-compatible implementation of hashlib.file_digest() that works with Python 3.9+ and follows the same API as Python 3.11+. - Parameters:
- fileobj – File-like object opened for reading in binary mode 
- digest – Hash algorithm name as str, hash constructor, or callable that returns hash object 
 
- Returns:
- Hash object with file contents 
 - Example - with open(“file.bin”, “rb”) as f:
- hash_obj = file_digest(f, “sha256”) print(hash_obj.hexdigest())