Add backends support for minimize_vectors#5235
Add backends support for minimize_vectors#5235PardhavMaradani wants to merge 11 commits intoMDAnalysis:developfrom
Conversation
|
It seems that tests are currently failing across the project so don't worry too much about them atm. |
Hi Brady, thanks for the pointer, yes, just noticed. There are a couple (I think macOS_14 and sdist_check_and_build), which are related to this PR. I'll take a look at them. Thanks update: I created Issue #5236 with the reason for most of the other test failures. |
tylerjereddy
left a comment
There was a problem hiding this comment.
I added a few comments. For benchmarks, using the asv format is usually preferable I think--standardized and easier to assess changes between commits.
| def test_minimize_vectors(box, shift, dtype): | ||
| @pytest.mark.parametrize("backend", ("serial", "openmp")) | ||
| # @pytest.mark.parametrize("backend", distopia_conditional_backend()) | ||
| def test_minimize_vectors(box, shift, dtype, backend): |
There was a problem hiding this comment.
I don't think I follow what is going on here--backend isn't actually passed to the minimize_vectors call under test below, right?
So the new functionality isn't really being tested? A test that requires OpenMP may also fail in a decent number of scenarios--I know sklearn has one such test in their suite that carries a special error message because it is so easy to build without OpenMP support (that test always fails when I run it on Mac).
I don't see anything in the PR description explaining this I don't think. I do see this indication that the two backends should be under test:
The tests currently use only the serial and openmp backends as distopia support doesn't exist yet.
maybe just an oversight
There was a problem hiding this comment.
This was an oversight, thanks for catching. I added it now and the tests pass locally for all the three backends. (since I have distopia with that support installed locally). The tests still use only serial and openmp in the current code. Thanks
| } | ||
| } | ||
|
|
||
| #endif |
There was a problem hiding this comment.
C++ comes with a few burdens that may be worth discussing:
- Can increase binary size with templating.
- Can increase compile time with templating.
- Can increase maintenance burden (how many active C++ devs do we have?)
I'm not trying to discourage the effort, just making sure it gets a quick discussion. Richard and Hugo are pretty strong C++ devs I think? (though also both quite busy I think?).
There was a problem hiding this comment.
The minimize_vectors API allows both float32 and float64. Because all the coords are float32's currently and the most common cases are to use the coords to calculate vectors, I wasn't sure where float64's were being used. But since that existed in the API (and supported in distopia), I didn't want to disrupt anything. I also didn't want to downcast them to float's to get it working. The only alternative is to duplicate all the four methods for floats and doubles. I can change this if needed based on the discussion. Thanks
There was a problem hiding this comment.
Adding a data point about sizes (for the relevant files) for the discussion:
Before
(mdanalysis-dev) mdanalysis/package$ ll MDAnalysis/lib/ | grep c_distances
-rw-rw-r-- 1 pardhav pardhav 2460 Feb 20 12:35 c_distances.pxd
-rw-rw-r-- 1 pardhav pardhav 14470 Feb 20 12:35 c_distances.pyx
-rw-rw-r-- 1 pardhav pardhav 10539 Feb 20 12:35 c_distances_openmp.pyx
-rwxrwxr-x 1 pardhav pardhav 371464 Feb 20 12:36 c_distances.cpython-311-x86_64-linux-gnu.so
-rwxrwxr-x 1 pardhav pardhav 126184 Feb 20 12:36 c_distances_openmp.cpython-311-x86_64-linux-gnu.so
(mdanalysis-dev) mdanalysis/package$
After
(mdanalysis-dev) mdanalysis/package$ ll MDAnalysis/lib/ | grep c_distances
-rw-rw-r-- 1 pardhav pardhav 2691 Feb 20 12:40 c_distances.pxd
-rw-rw-r-- 1 pardhav pardhav 10554 Feb 20 12:40 c_distances.pyx
-rw-rw-r-- 1 pardhav pardhav 11435 Feb 20 12:40 c_distances_openmp.pyx
-rwxrwxr-x 1 pardhav pardhav 364024 Feb 20 12:41 c_distances.cpython-311-x86_64-linux-gnu.so
-rwxrwxr-x 1 pardhav pardhav 361576 Feb 20 12:41 c_distances_openmp.cpython-311-x86_64-linux-gnu.so
(mdanalysis-dev) mdanalysis/package$
| # J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 | ||
| # | ||
| # | ||
| # distutils: language = c++ |
There was a problem hiding this comment.
do we need these designators? distutils kind of got ripped out of the ecosystem, though I suppose setuptools vendors it now
There was a problem hiding this comment.
I added these earlier before I knew that the language could be changed in the setup.py file as done for some other c++ extensions. I've now removed them and verified that it builds fine. Thanks
| triclinic and must be provided in the same format as returned by | ||
| :attr:`MDAnalysis.coordinates.timestep.Timestep.dimensions`: | ||
| ``[lx, ly, lz, alpha, beta, gamma]``. | ||
| backend : {'serial', 'OpenMP'}, optional |
There was a problem hiding this comment.
Maybe I missed it, but how is the user to control the number of threads? OMP_NUM_THREADS? joblib context?
There was a problem hiding this comment.
how many threads were used in your benchmarks?
There was a problem hiding this comment.
I will have to check this. I followed what was being done for other methods, but didn't quite look into the details yet - I will now. A very crude way I checked that that it was using multiple cores was by checking top and unlike the serial case (one core), I saw it using 4 8 cores in the openmp case - though I'll now have to check why only 4, because the machine has 8 cores. I learnt about the asv benchmarks for a different PR, but have been struggling to get it running so far, but will add those benchmarks after I can verify. Thanks
There was a problem hiding this comment.
but how is the user to control the number of threads? OMP_NUM_THREADS? joblib context?
I am not quite sure how to answer this. The only references I found in MDA documentation to control the number of threads is for the transfomation classes citing the use of threadpoolctl internally and also suggesting the use of the OMP_NUM_THREADS env variable to control the thread limit. I was however able to use both of them (the OMP_NUM_THREADS env var (at runtime) and threadpoolctl.threadpool_limits (in code) and verify that this controls the number of threads for these functions that support openmp backend.
Here is a sample top output for OMP_NUM_THREADS=4 when running the above compact tests:
Details
top - 11:29:01 up 1 day, 16:19, 1 user, load average: 2.70, 2.03, 1.49
Tasks: 356 total, 2 running, 353 sleeping, 0 stopped, 1 zombie
%Cpu0 : 57.6 us, 0.5 sy, 0.0 ni, 41.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 7.1 us, 0.0 sy, 0.0 ni, 92.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 60.3 us, 0.0 sy, 0.0 ni, 39.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 1.6 us, 0.5 sy, 0.0 ni, 97.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 1.6 us, 0.5 sy, 0.0 ni, 95.7 id, 2.2 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 62.0 us, 0.0 sy, 0.0 ni, 38.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.5 us, 0.5 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 87.0 us, 13.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7770.6 total, 1025.2 free, 3988.1 used, 2757.3 buff/cache
MiB Swap: 2048.0 total, 0.2 free, 2047.8 used. 2909.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
83009 pardhav 20 0 1136224 287580 78976 R 277.3 3.6 0:28.82 python
.....
Here is a similar one for OMP_NUM_THREADS=8:
Details
top - 11:30:53 up 1 day, 16:20, 1 user, load average: 2.92, 2.10, 1.56
Tasks: 356 total, 2 running, 353 sleeping, 0 stopped, 1 zombie
%Cpu0 : 53.8 us, 1.9 sy, 0.0 ni, 43.3 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu1 : 89.7 us, 10.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 45.3 us, 0.5 sy, 0.0 ni, 54.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 50.7 us, 0.5 sy, 0.0 ni, 48.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 47.6 us, 3.8 sy, 0.0 ni, 48.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 48.6 us, 0.5 sy, 0.0 ni, 50.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 48.4 us, 0.0 sy, 0.0 ni, 51.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 51.7 us, 0.0 sy, 0.0 ni, 48.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7770.6 total, 967.5 free, 4045.2 used, 2758.0 buff/cache
MiB Swap: 2048.0 total, 0.2 free, 2047.8 used. 2852.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
83031 pardhav 20 0 1823332 383052 78464 R 436.0 4.8 1:47.77 python
.....
I believe this should be the same with all other distance methods that support accelerated backends. Thanks
* Remove distutils designator as it is not needed (after changing lang to c++)
|
Added Here is the output of a run on a local setup: Here are separate graphs for On older commits or when certain backends are unavailable, the results will show up as ps. This might be slightly orthogonal to the original issue, but here are the benchmark results that show significant gains between the cython and the c++ version with just the serial backend (similar to the one in the first graph of the initial comment): cython vs c++ with serial backendThe table above was reformatted to be more readable. |


This is one of two PRs needed for #5234
Changes made in this Pull Request:
This PR moves the current
cythonimplementation toc++to supportopenmp(and have template support). Support for invoking the correspondingdistopiamethods (when supported indistopia) is also added.There is already a considerable speedup moving away from
cython(for defaultserial) and this gets better withopenmpanddistopiabackends.The YiiP equilibrium dataset is used for these performance tests. This trajectory has around 111k atoms/coordinates. The coordinates are duplicated by a multiple (1 - 10, using
np.tile) to calculate times.Here is an example of the
timeitcall:Here are the results with current
cythonversion and with the new backends support as part of this PR:minimize_vectorscan also be used to achievecompactwrapping support for visualization. Here are two methods as on-the-fly transformations for this.The first is from PR #2982, with minor modifications to make it a transformation that can be applied at runtime:
Compact wrapping with distance_array
Here is a version using
minimize_vectors:Compact wrapping with minimize_vectors
Here is a performance comparison of the above transformations (with slight changes to allow different backends and a multiple of the coordinates):
The tests currently use only the
serialandopenmpbackends asdistopiasupport doesn't exist yet. The above performance tests used a modified version ofdistopiathat has this support (I will create a separate PR in distopia for that).LLM / AI generated code disclosure
LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no
PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5235.org.readthedocs.build/en/5235/