mach#

An ultrafast CUDA-accelerated ultrasound beamformer for Python users. Developed at Forest Neurotech.

Benchmark Results

Benchmark: Beamforming PyMUST’s rotating-disk Doppler dataset at 1.1 trillion points per second (6.5x the speed of sound).

Highlights#

⚡ Ultra-fast beamforming: ~10x faster than prior state-of-the-art
🚀 GPU-accelerated: Leverages CUDA for maximum performance on NVIDIA GPUs
🎯 Optimized for research: Designed for functional ultrasound imaging (fUSI) and other ultrafast, high-channel-count, or volumetric-ensemble imaging
🐍 Python bindings: Zero-copy integration with CuPy, and JAX arrays via nanobind. NumPy support included.
🔬 Validated: Matches vbeam and PyMUST outputs

Installation#

Install from PyPI (recommended):#

pip install mach-beamform

Or: to include all optional dependencies, including to run the examples:

pip install mach-beamform[all]

Wheel prerequisites:

Linux
CUDA-enabled GPU with driver >= 12.3, compute-capability >= 7.5

Build from source#

make compile

Build prerequisites:

Linux
make
uv >= 0.9.7
gcc >= 8
nvcc >= 11.0

Docker Development#

Compile and test without installing the CUDA toolkit using our Docker development environment.

Prerequisites:

Docker Engine with nvidia-container-toolkit
CUDA-capable GPU with driver >= 12.3

Quick start:

# Build and start development container
docker compose run --rm dev

# Or use make shortcuts
make docker-build  # Build image (first time: ~2-3 min, rebuilds: ~30s)
make docker-dev    # Run container

Inside the container:

make compile  # Compile CUDA extension
make test     # Run tests

Your source code is mounted from the host, so you can edit files locally and compile in the container. Build artifacts (.venv/ and build/) are stored in anonymous volumes to avoid permission issues. Dependencies are pre-installed in the image and cached, so rebuilds are fast when only source code changes.

Examples#

Try our examples:

If you don’t have a CUDA-enabled GPU, you can download the notebook from the docs and open in Google Colab (select a GPU instance).

Contributing#

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Roadmap#

Beta release (v0.Y.0)#

✅ Single-wave transmissions (plane wave, focused, diverging)
✅ Linear interpolation beamforming
✅ Allow NumPy/CuPy/JAX/PyTorch inputs through Array API
✅ Comprehensive error handling
✅ PyPI packaging and distribution
✅ Interpolation options: nearest, linear, and quadratic

Numerically validated, but looking for feedback on API#

✅ Coherent compounding

See the project page for our up-to-date roadmap. We welcome feature requests!

Acknowledgments#

mach builds upon the excellent work of the ultrasound imaging community:

vbeam - For educational examples and validation benchmarks
PyMUST / PICMUS - For standardized evaluation datasets
Community contributors - Gev and Qi for CUDA optimization guidance

This package was developed by the Forest Neurotech team, a Focused Research Organization supported by Convergent Research and generous philanthropic funders.

Citation#

If you use mach in your research, you can cite:

@inproceedings{mach,
  title={{Mach: Beamforming one trillion points per second on a consumer GPU}},
  author={Guan, Charles and Rockhill, Alex and Pinton, Gianmarco},
  booktitle={Medical Imaging 2026: Ultrasonic Imaging and Tomography},
  year={2026},
  organization = {International Society for Optics and Photonics},
  publisher = {SPIE},
  URL={https://github.com/Forest-Neurotech/mach}
}

Examples:

Gallery

Performance:

Benchmarks

API Reference:

API Reference

mach

Contents

mach#

mach#

Highlights#

Installation#

Install from PyPI (recommended):#

Build from source#

Docker Development#

Examples#

Contributing#

Roadmap#

Beta release (v0.Y.0)#

Numerically validated, but looking for feedback on API#

Acknowledgments#

Citation#