GPU-Accelerated Tractography with DIPY and cuslines

1 minute read

Published: June 07, 2026

GPU-Accelerated Tractography

For whole-brain tractography the bottleneck is usually the inner tracking loop: at each step you evaluate a model, sample a direction, and move the streamline. DIPY supports offloading this to GPU via the cuslines library, which provides CUDA, Metal, and WebGPU backends alongside a CPU fallback using numba.

Supported backends

Backend	Hardware
`gpu` (CUDA)	NVIDIA GPU
`metal`	Apple Silicon / AMD GPU on macOS
`webgpu`	Cross-platform via wgpu
`cpu`	CPU with numba JIT

The backend is selected at runtime via --device, so the same script runs on any hardware.

Usage

The script accepts a DWI image, b-values, b-vectors, a brain mask, and an ROI seed mask:

python run_gpu_streamlines.py dwi.nii.gz bvals bvecs mask.nii.gz roi.nii.gz \
  --device gpu \
  --dg prob \
  --output-prefix results/tractogram

If no files are provided it defaults to the Stanford HARDI dataset for easy testing.

How it works

Internally, cuslines takes the CSD or DTI peaks array and seeds, and runs the tracking loop in parallel across seeds on device. The Python side only needs to:

Fit the ODF model on CPU (or GPU if supported)
Pass the peaks array and seed coordinates to cuslines
Receive the streamline array back in host memory

import cuslines

gpu_tracker = cuslines.GPUTracker(
    data=peaks_array,
    seeds=seed_coords,
    affine=affine,
    step_size=0.5,
    max_length=300,
    device="gpu"
)
streamlines = gpu_tracker.generate()

On a modern GPU this can be 10-100x faster than single-threaded CPU tracking, making whole-brain probabilistic tractography with millions of seeds practical in minutes rather than hours.

Share on

Twitter Facebook LinkedIn

Shreya Kapoor

GPU-Accelerated Tractography with DIPY and cuslines