mach.io.utils#

General utilities for downloading and caching files with integrity verification.

This module provides: - Efficient file hashing compatible with Python 3.9+ (backport of hashlib.file_digest) - Robust file downloading with progress bars and integrity verification - Smart caching with automatic re-download on corruption - Support for multiple hash algorithms (SHA1, SHA256, MD5, etc.)

Key Functions: - file_digest(): Python 3.11+ compatible file hashing for older Python versions - cached_download(): Download and cache files with integrity verification - verify_file_integrity(): Verify file integrity using various hash algorithms

Optional dependencies: - tqdm: For progress bars during downloads

Functions

cached_download(url[, cache_dir, filename, ...])

Download a file and cache it with optional integrity verification.

download_file(url, output_path[, timeout, ...])

Download a file from a URL with optional progress bar and integrity verification.

file_digest(fileobj, digest)

Return a digest object that has been updated with contents of file object.

verify_file_integrity(file_path, ...)

Verify file integrity using specified hash algorithm.

mach.io.utils.cached_download(
url: str,
cache_dir: str | Path = PosixPath('/home/runner/.cache/mach'),
filename: str | Path | None = None,
timeout: int = 30,
*,
overwrite: bool = False,
expected_size: int | None = None,
expected_hash: str | None = None,
digest: None | str | Callable[[], hashlib._Hash] = None,
show_progress: bool = True,
) Path#

Download a file and cache it with optional integrity verification.

Parameters:
  • url – URL to download

  • cache_dir – Directory to cache the file in

  • filename – Name to save the file as (default: derived from URL) if absolute path is provided, it will be used as-is without cache-dir

  • timeout – Connection timeout in seconds

  • overwrite – Whether to overwrite existing files

  • expected_hash – Expected hash value for integrity verification

  • digest – Hash algorithm to use (default: “sha1”)

  • expected_size – Expected file size in bytes

  • show_progress – Whether to show progress bar (requires tqdm)

Returns:

Path to the cached file

mach.io.utils.download_file(
url: str,
output_path: str | Path,
timeout: int = 30,
chunk_size: int = 1048576,
*,
overwrite: bool = False,
expected_hash: str | None = None,
digest: None | str | Callable[[], hashlib._Hash] = None,
expected_size: int | None = None,
show_progress: bool = True,
) Path#

Download a file from a URL with optional progress bar and integrity verification.

Parameters:
  • url – URL to download

  • output_path – Path where the file will be saved

  • timeout – Connection timeout in seconds

  • chunk_size – Size of chunks to download

  • overwrite – Whether to overwrite existing files

  • expected_hash – Expected hash value for integrity verification

  • digest – Hash algorithm to use (default: “sha1”)

  • expected_size – Expected file size in bytes

  • show_progress – Whether to show progress bar (requires tqdm)

Returns:

Path to the downloaded file

Raises:
  • RuntimeError – If download fails or integrity check fails

  • ImportError – If show_progress=True but tqdm is not installed

mach.io.utils.file_digest(
fileobj,
digest: str | Callable[[], hashlib._Hash],
) hashlib._Hash#

Return a digest object that has been updated with contents of file object.

This is a backport-compatible implementation of hashlib.file_digest() that works with Python 3.9+ and follows the same API as Python 3.11+.

Parameters:
  • fileobj – File-like object opened for reading in binary mode

  • digest – Hash algorithm name as str, hash constructor, or callable that returns hash object

Returns:

Hash object with file contents

Example

with open(“file.bin”, “rb”) as f:

hash_obj = file_digest(f, “sha256”) print(hash_obj.hexdigest())

mach.io.utils.verify_file_integrity(
file_path: Path,
expected_hash: str,
digest: str | Callable[[], hashlib._Hash],
) bool#

Verify file integrity using specified hash algorithm.

Parameters:
  • file_path – Path to the file to verify

  • expected_hash – Expected hash value

  • digest – Hash algorithm to use, e.g. “sha256” or hashlib.sha256

Returns:

True if hash matches, False otherwise