-
Notifications
You must be signed in to change notification settings - Fork 340
Description
Hi
This is not a bug report per se rather asking question or just FYI.
This is a difficult issue to report because it is quite difficult to describe.
Note this is has only been tested/found with the stumpy.stump function,
however it is relevant in many other places too via core.preprocess_non_normalized
It appears that the use of parallel=True in core._parallel_rolling_func does
not have the desired effect in all circumstances.
Although there is no facility to invoke cache=True on the njit decorators in
stumpy, there are use cases where cache=True is an absolute requirement (I
shall update #699). So if modifications are made to allow for numba caching it
reveals that on the first run when stumpy.stump is called in a process/thread
it will compile and run fine (FYI with len(T_A) == 1004), the first run with the
njit compilation taking around 13 seconds or so. The numba cache files are
created and each subsequent call of the stumpy.stump function in the same
process/thread with the same T_A will run super fast with no issues in like
0.009974628977943212 seconds. All good.
The problem arises when another process is invoked and imports stumpy and loads
the numba cached jit files. At this point stumpy imports fine but when the
stumpy.stump function is called in jupyter the kernel dies and in a Python
terminal a Segmentation fault (core dumped) is encountered.
As we are all aware debugging with numba can be quite difficult and laborious :)
However, if parallel=True is simply commented out of the core._parallel_rolling_func
njit decorator, it works fine. I went through them all one by one (STRIPPED
down version), disabling both parallel and fastmath all individually and
tested, which eventually revealed core._parallel_rolling_func to be the
culprit.
@njit(
# parallel=True,
fastmath={"nsz", "arcp", "contract", "afn", "reassoc"},
cache=config.STUMPY_NUMBA_CACHE,
)
def _parallel_rolling_func(a, w, func):
"""
Compute the (embarrassingly parallel) rolling metric by applying a user defined
function on a 1-D array
I am not certain of why this is the case, perhaps when caching is in play the
func parameter is the problem as what func is numba compiling it, how does it
know what func will be passed to it? However, my understanding of what can and
cannot be achieved with numba is not that verbose.
That said I did try to modify that to actually use the func name and that had
the same kernel died/Segmentation fault result so perhaps not.
For example:
def _parallel_rolling_func(a, w, func_name):
...
...
l = a.shape[0] - w + 1
out = np.empty(l)
for i in prange(l):
# out[i] = func(a[i : i + w])
if func_name == 'np.ptp':
out[i] = np.ptp(a[i : i + w])
# AND
def _rolling_isconstant(a, w):
...
...
out = _parallel_rolling_func(a, w, 'np.ptp')
As I said a difficult one to describe. However although it may run during first
compilation, that the use of parallel in core._parallel_rolling_func is
breaking when compiled and cached does raise the question, is it valid to use
in this function?
That is one of the advantages of cache=True I find it really does validate
njit functions. Anything that is not correct will definitely always break when
the cached version is attempted to be loaded. It is quite handy for development
testing in that way I find (for me at least) :)
I must further caveat this with the fact that I am tested on a STRIPPED down
version, which I stripped down to trace this bug down. The version only has:
__init__.py - ONLY imports required in core.py, stump.py and aamp.py
core.py - ONLY functions required in stump.py and aamp.py
aamp.py
stump.py
Mainly because stump is all I currently need at the moment, but I must have
cache=True otherwise I would have to consider stepping back to
matrixprofile-foundation/matrixprofile but I would have maintain it myself seeing
as they have binned it now :) But it did load and run in under 1 second and
having to wait between 13 and 17 seconds to run stumpy.stump is not an option,
but 0.009974628977943212 seconds is awesome!
So I am not sure what to do with this? Other than advise you of the findings.
I have applied the change to a full v1.11.1 version but is modified to allow for
caching which allows for cache=True to be set on all core, stump and aamp
njit decorators and that it works too, as long as parallel is not passed. Once
again only tested with stumpy.stump.