Skip to content

Comments

Add backends support for minimize_vectors#5235

Open
PardhavMaradani wants to merge 11 commits intoMDAnalysis:developfrom
PardhavMaradani:minimize-vectors-backend-support
Open

Add backends support for minimize_vectors#5235
PardhavMaradani wants to merge 11 commits intoMDAnalysis:developfrom
PardhavMaradani:minimize-vectors-backend-support

Conversation

@PardhavMaradani
Copy link
Contributor

@PardhavMaradani PardhavMaradani commented Feb 18, 2026

This is one of two PRs needed for #5234

Changes made in this Pull Request:

This PR moves the current cython implementation to c++ to support openmp (and have template support). Support for invoking the corresponding distopia methods (when supported in distopia) is also added.

There is already a considerable speedup moving away from cython (for default serial) and this gets better with openmp and distopia backends.

The YiiP equilibrium dataset is used for these performance tests. This trajectory has around 111k atoms/coordinates. The coordinates are duplicated by a multiple (1 - 10, using np.tile) to calculate times.

Here is an example of the timeit call:

import MDAnalysis as mda
from MDAnalysisData import yiip_equilibrium
from MDAnalysis.lib.distances import check_box, minimize_vectors
import timeit

data = yiip_equilibrium.fetch_yiip_equilibrium_short()

u = mda.Universe(data["topology"], data["trajectory"])

ts = u.trajectory[0]
box = ts.dimensions
coords = ts.positions
boxtype, tric_vecs = check_box(box)
center = tric_vecs.sum(axis=0) * 0.5

backend = "serial"
secs = timeit.timeit(lambda: minimize_vectors(coords - center, box, backend=backend) + center, number=100)

Here are the results with current cython version and with the new backends support as part of this PR:

minimize_vectors

minimize_vectors can also be used to achieve compact wrapping support for visualization. Here are two methods as on-the-fly transformations for this.

The first is from PR #2982, with minor modifications to make it a transformation that can be applied at runtime:

Compact wrapping with distance_array
# Compact PBC wrapping transformation
# From: https://github.com/MDAnalysis/mdanalysis/pull/2982
def compact_w_distance_array(ts):
    backend = "serial"
    box = ts.dimensions
    coords = ts.positions
    boxtype, tric_vecs = check_box(box)
    pbc_img_coeffs = np.array(list(itertools.product([0, -1, 1], repeat=3)))
    pbc_img_vecs = np.matmul(pbc_img_coeffs, tric_vecs)
    min_compact_norm = np.linalg.norm(pbc_img_vecs[1:], axis=1).min() * 0.5
    center = tric_vecs.sum(axis=0) * 0.5
    pbc_imgs = pbc_img_vecs + center
    # Also improvable as distance^2
    center_dists = distance_array(center, coords)[0]
    outside_ats = np.where(center_dists > min_compact_norm)[0]
    # Let's make sure we're dealing with everyone in the primary cell
    outside_pos = coords[outside_ats] = apply_PBC(
        coords[outside_ats], box
    )
    # Potentially save cycles by re-filtering after PBC. It makes sense for
    # GROMACS trajectories, that are saved as the rectangular cell.
    center_dists = distance_array(center, outside_pos)[0]
    outside_ats = outside_ats[center_dists > min_compact_norm]
    outside_pos = coords[outside_ats]
    # This works for the general case by checking the distance to all 27
    # unit cell centers (primary + 1st neighbors). Will break for extremely
    # skewed cases where the closest center is in a 2nd neighbor unit cell.
    # The distance check to all 26 1st neighbors can also probably be
    # optimized, both in general and with boxtype-specific shortcuts.
    # Also improvable as distance^2
    closest_img_ndx = distance_array(outside_pos, pbc_imgs, backend=backend).argmin(axis=1)
    to_move = closest_img_ndx.nonzero()[0]
    closest_img_ndx = closest_img_ndx[to_move]
    outside_ats = outside_ats[to_move]
    outside_pos = coords[outside_ats] - pbc_img_vecs[closest_img_ndx]
    coords[outside_ats] = outside_pos
    return ts

Here is a version using minimize_vectors:

Compact wrapping with minimize_vectors
def compact_w_minimize_vectors(ts):
    backend = "serial"
    box = ts.dimensions
    coords = ts.positions
    boxtype, tric_vecs = check_box(box)
    pbc_img_coeffs = np.array(list(itertools.product([0, -1, 1], repeat=3)))
    pbc_img_vecs = np.matmul(pbc_img_coeffs, tric_vecs)
    min_compact_norm = np.linalg.norm(pbc_img_vecs[1:], axis=1).min() * 0.5
    center = tric_vecs.sum(axis=0) * 0.5
    center_dists = distance_array(center, coords)[0]
    outside_ats = np.where(center_dists > min_compact_norm)[0]
    coords[outside_ats] = minimize_vectors(coords[outside_ats] - center, box, backend=backend) + center
    return ts

Here is a performance comparison of the above transformations (with slight changes to allow different backends and a multiple of the coordinates):

compact - distance_array (da) vs minimize_vectors (mv)

The tests currently use only the serial and openmp backends as distopia support doesn't exist yet. The above performance tests used a modified version of distopia that has this support (I will create a separate PR in distopia for that).

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)
  • LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5235.org.readthedocs.build/en/5235/

@PardhavMaradani PardhavMaradani marked this pull request as draft February 18, 2026 06:12
@BradyAJohnston
Copy link
Member

It seems that tests are currently failing across the project so don't worry too much about them atm.

@PardhavMaradani
Copy link
Contributor Author

PardhavMaradani commented Feb 18, 2026

It seems that tests are currently failing across the project so don't worry too much about them atm.

Hi Brady, thanks for the pointer, yes, just noticed. There are a couple (I think macOS_14 and sdist_check_and_build), which are related to this PR. I'll take a look at them. Thanks

update: I created Issue #5236 with the reason for most of the other test failures.
update: Fixed the macOS and sdist failures due to this PR. All remaining test failures are due to above.

@PardhavMaradani PardhavMaradani marked this pull request as ready for review February 18, 2026 14:34
Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments. For benchmarks, using the asv format is usually preferable I think--standardized and easier to assess changes between commits.

def test_minimize_vectors(box, shift, dtype):
@pytest.mark.parametrize("backend", ("serial", "openmp"))
# @pytest.mark.parametrize("backend", distopia_conditional_backend())
def test_minimize_vectors(box, shift, dtype, backend):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow what is going on here--backend isn't actually passed to the minimize_vectors call under test below, right?

So the new functionality isn't really being tested? A test that requires OpenMP may also fail in a decent number of scenarios--I know sklearn has one such test in their suite that carries a special error message because it is so easy to build without OpenMP support (that test always fails when I run it on Mac).

I don't see anything in the PR description explaining this I don't think. I do see this indication that the two backends should be under test:

The tests currently use only the serial and openmp backends as distopia support doesn't exist yet.

maybe just an oversight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an oversight, thanks for catching. I added it now and the tests pass locally for all the three backends. (since I have distopia with that support installed locally). The tests still use only serial and openmp in the current code. Thanks

}
}

#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ comes with a few burdens that may be worth discussing:

  1. Can increase binary size with templating.
  2. Can increase compile time with templating.
  3. Can increase maintenance burden (how many active C++ devs do we have?)

I'm not trying to discourage the effort, just making sure it gets a quick discussion. Richard and Hugo are pretty strong C++ devs I think? (though also both quite busy I think?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimize_vectors API allows both float32 and float64. Because all the coords are float32's currently and the most common cases are to use the coords to calculate vectors, I wasn't sure where float64's were being used. But since that existed in the API (and supported in distopia), I didn't want to disrupt anything. I also didn't want to downcast them to float's to get it working. The only alternative is to duplicate all the four methods for floats and doubles. I can change this if needed based on the discussion. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a data point about sizes (for the relevant files) for the discussion:

Before

(mdanalysis-dev) mdanalysis/package$ ll MDAnalysis/lib/ | grep c_distances
-rw-rw-r--  1 pardhav pardhav   2460 Feb 20 12:35 c_distances.pxd
-rw-rw-r--  1 pardhav pardhav  14470 Feb 20 12:35 c_distances.pyx
-rw-rw-r--  1 pardhav pardhav  10539 Feb 20 12:35 c_distances_openmp.pyx
-rwxrwxr-x  1 pardhav pardhav 371464 Feb 20 12:36 c_distances.cpython-311-x86_64-linux-gnu.so
-rwxrwxr-x  1 pardhav pardhav 126184 Feb 20 12:36 c_distances_openmp.cpython-311-x86_64-linux-gnu.so
(mdanalysis-dev) mdanalysis/package$ 

After

(mdanalysis-dev) mdanalysis/package$ ll MDAnalysis/lib/ | grep c_distances
-rw-rw-r--  1 pardhav pardhav   2691 Feb 20 12:40 c_distances.pxd
-rw-rw-r--  1 pardhav pardhav  10554 Feb 20 12:40 c_distances.pyx
-rw-rw-r--  1 pardhav pardhav  11435 Feb 20 12:40 c_distances_openmp.pyx
-rwxrwxr-x  1 pardhav pardhav 364024 Feb 20 12:41 c_distances.cpython-311-x86_64-linux-gnu.so
-rwxrwxr-x  1 pardhav pardhav 361576 Feb 20 12:41 c_distances_openmp.cpython-311-x86_64-linux-gnu.so
(mdanalysis-dev) mdanalysis/package$ 

# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
#
# distutils: language = c++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these designators? distutils kind of got ripped out of the ecosystem, though I suppose setuptools vendors it now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these earlier before I knew that the language could be changed in the setup.py file as done for some other c++ extensions. I've now removed them and verified that it builds fine. Thanks

triclinic and must be provided in the same format as returned by
:attr:`MDAnalysis.coordinates.timestep.Timestep.dimensions`:
``[lx, ly, lz, alpha, beta, gamma]``.
backend : {'serial', 'OpenMP'}, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but how is the user to control the number of threads? OMP_NUM_THREADS? joblib context?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many threads were used in your benchmarks?

Copy link
Contributor Author

@PardhavMaradani PardhavMaradani Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to check this. I followed what was being done for other methods, but didn't quite look into the details yet - I will now. A very crude way I checked that that it was using multiple cores was by checking top and unlike the serial case (one core), I saw it using 4 8 cores in the openmp case - though I'll now have to check why only 4, because the machine has 8 cores. I learnt about the asv benchmarks for a different PR, but have been struggling to get it running so far, but will add those benchmarks after I can verify. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how is the user to control the number of threads? OMP_NUM_THREADS? joblib context?

I am not quite sure how to answer this. The only references I found in MDA documentation to control the number of threads is for the transfomation classes citing the use of threadpoolctl internally and also suggesting the use of the OMP_NUM_THREADS env variable to control the thread limit. I was however able to use both of them (the OMP_NUM_THREADS env var (at runtime) and threadpoolctl.threadpool_limits (in code) and verify that this controls the number of threads for these functions that support openmp backend.

Here is a sample top output for OMP_NUM_THREADS=4 when running the above compact tests:

Details
top - 11:29:01 up 1 day, 16:19,  1 user,  load average: 2.70, 2.03, 1.49
Tasks: 356 total,   2 running, 353 sleeping,   0 stopped,   1 zombie
%Cpu0  : 57.6 us,  0.5 sy,  0.0 ni, 41.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  7.1 us,  0.0 sy,  0.0 ni, 92.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 60.3 us,  0.0 sy,  0.0 ni, 39.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  1.6 us,  0.5 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  1.6 us,  0.5 sy,  0.0 ni, 95.7 id,  2.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 62.0 us,  0.0 sy,  0.0 ni, 38.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.5 us,  0.5 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 87.0 us, 13.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7770.6 total,   1025.2 free,   3988.1 used,   2757.3 buff/cache
MiB Swap:   2048.0 total,      0.2 free,   2047.8 used.   2909.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                           
  83009 pardhav   20   0 1136224 287580  78976 R 277.3   3.6   0:28.82 python                                            
.....

Here is a similar one for OMP_NUM_THREADS=8:

Details
top - 11:30:53 up 1 day, 16:20,  1 user,  load average: 2.92, 2.10, 1.56
Tasks: 356 total,   2 running, 353 sleeping,   0 stopped,   1 zombie
%Cpu0  : 53.8 us,  1.9 sy,  0.0 ni, 43.3 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu1  : 89.7 us, 10.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 45.3 us,  0.5 sy,  0.0 ni, 54.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 50.7 us,  0.5 sy,  0.0 ni, 48.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 47.6 us,  3.8 sy,  0.0 ni, 48.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 48.6 us,  0.5 sy,  0.0 ni, 50.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 48.4 us,  0.0 sy,  0.0 ni, 51.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 51.7 us,  0.0 sy,  0.0 ni, 48.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7770.6 total,    967.5 free,   4045.2 used,   2758.0 buff/cache
MiB Swap:   2048.0 total,      0.2 free,   2047.8 used.   2852.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                           
  83031 pardhav   20   0 1823332 383052  78464 R 436.0   4.8   1:47.77 python                                            
.....

I believe this should be the same with all other distance methods that support accelerated backends. Thanks

@PardhavMaradani
Copy link
Contributor Author

PardhavMaradani commented Feb 21, 2026

Added asv benchmarks. Filed PR #5240 separately to resolve general asv run issues irrespective of whether this PR gets merged.

Here is the output of a run on a local setup:

(mdanalysis-dev) mdanalysis/benchmarks$ asv run --python=same --bench="lib.distances.DistancesMinimizeVectorsBench"
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_pardhav_miniforge3_envs_mdanalysis-dev_bin_python3.11
[50.00%] ··· Running (lib.distances.DistancesMinimizeVectorsBench.time_minimize_vectors--).
[100.00%] ··· lib.distances.DistancesMinimizeVectorsBench.time_minimize_vectors                                                                                                      ok
[100.00%] ··· =========== ================ ================ ================== ==================== ==================== ======================
              --                                                              box / backend                                                    
              ----------- ---------------------------------------------------------------------------------------------------------------------
               num_atoms   ortho / serial   ortho / openmp   ortho / distopia   triclinic / serial   triclinic / openmp   triclinic / distopia 
              =========== ================ ================ ================== ==================== ==================== ======================
                   10         12.3±1μs        18.9±0.1μs        15.3±0.3μs          28.6±0.3μs            42.2±2μs             31.6±0.3μs      
                  100        13.1±0.1μs       20.0±0.3μs       16.2±0.08μs          33.8±0.3μs            43.5±1μs             33.3±0.2μs      
                  1000       21.9±0.1μs       23.5±0.6μs        19.0±0.2μs          80.7±0.3μs           57.6±0.6μs            41.9±0.1μs      
                 10000        103±3μs          57.5±2μs          37.5±1μs            535±1μs              167±1μs               120±5μs        
                 100000       926±6μs          414±40μs          291±10μs          5.12±0.03ms          1.31±0.05ms           1.04±0.01ms      
                1000000     12.4±0.04ms       6.24±0.2ms        6.67±0.1ms          59.8±0.4ms           14.6±0.1ms           13.6±0.08ms      
              =========== ================ ================ ================== ==================== ==================== ======================

(mdanalysis-dev) mdanalysis/benchmarks$ 

Here are separate graphs for ortho and triclinic with a lot more samples from asv preview:
Ortho:
Screenshot from 2026-02-21 13-17-02

Triclinic:
Screenshot from 2026-02-21 13-16-43

On older commits or when certain backends are unavailable, the results will show up as n/a. Thanks

ps. This might be slightly orthogonal to the original issue, but here are the benchmark results that show significant gains between the cython and the c++ version with just the serial backend (similar to the one in the first graph of the initial comment):

cython vs c++ with serial backend
(mdanalysis-dev) mdanalysis/benchmarks$ asv compare beaece1b be2b4db5

All benchmarks:

| Change   | Before [beaece1b] | After [be2b4db5] | Ratio   | Benchmark (Parameter)                                  |
|----------|-------------------|------------------|---------|--------------------------------------------------------|
| -        | 16.1±0.5μs        | 12.4±0.09μs      | 0.77    | time_minimize_vectors_no_backend(10, 'ortho')          |
| -        | 34.2±1μs          | 27.0±0.4μs       | 0.79    | time_minimize_vectors_no_backend(10, 'triclinic')      |
| -        | 24.5±2μs          | 13.9±0.08μs      | 0.57    | time_minimize_vectors_no_backend(100, 'ortho')         |
| -        | 47.9±1μs          | 31.7±0.5μs       | 0.66    | time_minimize_vectors_no_backend(100, 'triclinic')     |
| -        | 109±6μs           | 23.6±0.3μs       | 0.22    | time_minimize_vectors_no_backend(1000, 'ortho')        |
| -        | 169±9μs           | 84.1±1μs         | 0.50    | time_minimize_vectors_no_backend(1000, 'triclinic')    |
| -        | 931±50μs          | 114±0.5μs        | 0.12    | time_minimize_vectors_no_backend(10000, 'ortho')       |
| -        | 1.42±0.02ms       | 570±20μs         | 0.40    | time_minimize_vectors_no_backend(10000, 'triclinic')   |
| -        | 8.98±0.3ms        | 1.01±0.01ms      | 0.11    | time_minimize_vectors_no_backend(100000, 'ortho')      |
| -        | 12.6±0.09ms       | 5.66±0.06ms      | 0.45    | time_minimize_vectors_no_backend(100000, 'triclinic')  |
| -        | 93.4±4ms          | 12.4±0.04ms      | 0.13    | time_minimize_vectors_no_backend(1000000, 'ortho')     |
| -        | 136±6ms           | 59.6±0.4ms       | 0.44    | time_minimize_vectors_no_backend(1000000, 'triclinic') |
| -        | 17.9±0.1ms        | 2.38±0.04ms      | 0.13    | time_minimize_vectors_no_backend(200000, 'ortho')      |
| -        | 26.4±0.3ms        | 12.0±0.1ms       | 0.46    | time_minimize_vectors_no_backend(200000, 'triclinic')  |
| -        | 26.9±0.1ms        | 3.70±0.07ms      | 0.14    | time_minimize_vectors_no_backend(300000, 'ortho')      |
| -        | 38.5±0.3ms        | 18.2±0.07ms      | 0.47    | time_minimize_vectors_no_backend(300000, 'triclinic')  |
| -        | 35.7±0.09ms       | 4.97±0.1ms       | 0.14    | time_minimize_vectors_no_backend(400000, 'ortho')      |
| -        | 51.4±0.2ms        | 24.3±0.1ms       | 0.47    | time_minimize_vectors_no_backend(400000, 'triclinic')  |
| -        | 44.4±0.2ms        | 6.15±0.04ms      | 0.14    | time_minimize_vectors_no_backend(500000, 'ortho')      |
| -        | 64.6±0.5ms        | 30.3±0.2ms       | 0.47    | time_minimize_vectors_no_backend(500000, 'triclinic')  |
| -        | 53.4±0.2ms        | 7.35±0.06ms      | 0.14    | time_minimize_vectors_no_backend(600000, 'ortho')      |
| -        | 77.0±0.2ms        | 36.3±0.1ms       | 0.47    | time_minimize_vectors_no_backend(600000, 'triclinic')  |
| -        | 64.8±3ms          | 8.64±0.08ms      | 0.13    | time_minimize_vectors_no_backend(700000, 'ortho')      |
| -        | 92.1±2ms          | 42.1±0.2ms       | 0.46    | time_minimize_vectors_no_backend(700000, 'triclinic')  |
| -        | 71.0±0.3ms        | 9.94±0.08ms      | 0.14    | time_minimize_vectors_no_backend(800000, 'ortho')      |
| -        | 103±0.4ms         | 47.9±0.1ms       | 0.46    | time_minimize_vectors_no_backend(800000, 'triclinic')  |
| -        | 80.3±0.4ms        | 11.2±0.07ms      | 0.14    | time_minimize_vectors_no_backend(900000, 'ortho')      |
| -        | 122±3ms           | 53.9±0.2ms       | 0.44    | time_minimize_vectors_no_backend(900000, 'triclinic')  |

(mdanalysis-dev) mdanalysis/benchmarks$

The table above was reformatted to be more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants