mach#

mach#

PyPI Python License Actions status

An ultrafast CUDA-accelerated ultrasound beamformer for Python users. Developed at Forest Neurotech.

Benchmark Results

Benchmark: Beamforming PyMUST’s rotating-disk Doppler dataset at 1.1 trillion points per second (6.5x the speed of sound).

Highlights#

  • ⚑ Ultra-fast beamforming: ~10x faster than prior state-of-the-art

  • πŸš€ GPU-accelerated: Leverages CUDA for maximum performance on NVIDIA GPUs

  • 🎯 Optimized for research: Designed for functional ultrasound imaging (fUSI) and other ultrafast, high-channel-count, or volumetric-ensemble imaging

  • 🐍 Python bindings: Zero-copy integration with CuPy, and JAX arrays via nanobind. NumPy support included.

  • πŸ”¬ Validated: Matches vbeam and PyMUST outputs

Installation#

Build from source#

make compile

Build prerequisites:

  • Linux

  • make

  • uv >= 0.9.7

  • gcc >= 8

  • nvcc >= 11.0

Docker Development#

Compile and test without installing the CUDA toolkit using our Docker development environment.

Prerequisites:

Quick start:

# Build and start development container
docker compose run --rm dev

# Or use make shortcuts
make docker-build  # Build image (first time: ~2-3 min, rebuilds: ~30s)
make docker-dev    # Run container

Inside the container:

make compile  # Compile CUDA extension
make test     # Run tests

Your source code is mounted from the host, so you can edit files locally and compile in the container. Build artifacts (.venv/ and build/) are stored in anonymous volumes to avoid permission issues. Dependencies are pre-installed in the image and cached, so rebuilds are fast when only source code changes.

Examples#

Try our examples:

If you don’t have a CUDA-enabled GPU, you can download the notebook from the docs and open in Google Colab (select a GPU instance).

Contributing#

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Roadmap#

Beta release (v0.Y.0)#

  • βœ… Single-wave transmissions (plane wave, focused, diverging)

  • βœ… Linear interpolation beamforming

  • βœ… Allow NumPy/CuPy/JAX/PyTorch inputs through Array API

  • βœ… Comprehensive error handling

  • βœ… PyPI packaging and distribution

  • βœ… Interpolation options: nearest, linear, and quadratic

Numerically validated, but looking for feedback on API#

  • βœ… Coherent compounding

See the project page for our up-to-date roadmap. We welcome feature requests!

Acknowledgments#

mach builds upon the excellent work of the ultrasound imaging community:

  • vbeam - For educational examples and validation benchmarks

  • PyMUST / PICMUS - For standardized evaluation datasets

  • Community contributors - Gev and Qi for CUDA optimization guidance

This package was developed by the Forest Neurotech team, a Focused Research Organization supported by Convergent Research and generous philanthropic funders.

Citation#

If you use mach in your research, you can cite:

@inproceedings{mach,
  title={{Mach: Beamforming one trillion points per second on a consumer GPU}},
  author={Guan, Charles and Rockhill, Alex and Pinton, Gianmarco},
  booktitle={Medical Imaging 2026: Ultrasonic Imaging and Tomography},
  year={2026},
  organization = {International Society for Optics and Photonics},
  publisher = {SPIE},
  URL={https://github.com/Forest-Neurotech/mach}
}

Examples:

Performance: