-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
@lgarrison To guide what needs to/could happen for Corrfunc 3.0, here is my list:
Essential (?)
-
Remove python2 support completely
- Remove python2 from setup.py
- Remove python2 related code from C extensions
- Remove
from __future__type constructs from python code
-
Add modern packaging with
pyproject.toml/meson/ whatever-else-we-should-be-using- how?
-
Solve the multiple OpenMP runtime library issue
- While I don't know if there is any way to remove the duplicate OMP runtime library linking with C code, we might be able to detect that there is an issue by running some simple OMP reduction routines as a C extension and checking if the result matches the correct answer.
- We might also want to switch to creating shared libraries rather than static libraries
Possibly (?)
-
Add
numbinsoptimisation that only uses the number of bins necessary (as determined by the min. and max. distance possible between two cell-pairs)- Add a function that takes a pair of cells and the histogram bins and returns the min. and max. bin-indices needed
- These bin indices should be stored as part of the cell-pair struct and used to call the SIMD kernels (the bin indices need to be initialised to 0 and
nbins-1) - Ideally, there is a specialised kernel for a single histogram bin that uses a
simd reductionclause, or simply a+=. This could be in conjunction with a bit-setting routine that sets bits and shifts to count up to 64 pairs before callingpopcnt. However, I don't see how to do that without a LOT OF code duplication.
-
Change the OpenMP parallelization to go over cell-pairs (improves cache utilisation, reduces memory requirement -> we can increase the max-bin-ref factors)
- Create a new
generate_cell_pairsfunction that returns the potential neighbouring cell-pairs for any given primary cell - Change the OpenMP parallelization to go over these cell-pairs (improves cache re-use since the primary cell is always one of the cells)
- Test that the code gives identical results but (hopefully) faster
- Create a new
May be (?)
-
Add ARM64 kernels
- Add kernels to all pair-counters
- Run the INTEGRATION_TESTS on laptop
- Add
-march=armv8a(or-mcpu=apple-m1) to CFLAGS - Make sure that the OSX tests run for both ARM64 and Intel cpus
-
Rename package to
corrfuncand release conda wheels- Add
target_clonesto all functions - cross-compile with recent gcc (possibly on
linux + x86_64) - Add deprecation option so that
import Corrfuncstill works (how?)
- Add
This is also open for community discussion. If anyone has opinions on what should go in, please do add a comment.
Metadata
Metadata
Assignees
Labels
No labels