-
Notifications
You must be signed in to change notification settings - Fork 97
Closes #5330: Improve ak.array Bigint array transfer performance #5334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Closes #5330: Improve ak.array Bigint array transfer performance #5334
Conversation
70a8d64 to
6581ba0
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5334 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 5
Lines ? 109
Branches ? 0
========================================
Hits ? 109
Misses ? 0
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| dtype: Union[np.dtype, type, str, None] = None, | ||
| copy: bool = False, | ||
| max_bits: int = -1, | ||
| *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the comment inside the signature. Can it be moved lower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look better? Or should I just move it into notes?
ajpotts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
arkouda/numpy/pdarraycreation.py
Outdated
| # Match previous behavior: object array containing floats becomes float64 | ||
| # and uses the single-limb float path above. | ||
| a = a.astype(np.float64, copy=False) | ||
| flat = a.ravel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like flat is defined and not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
arkouda/numpy/pdarraycreation.py
Outdated
| flat = a.ravel() | ||
| if not np.any(a): | ||
| return zeros(size=a.shape, dtype=bigint, max_bits=max_bits) | ||
| ak_a = array(a.astype(np.float64, copy=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a.astype(np.float64, copy=False) is called twice, which is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to ak_a = array(a)
| "num_arrays": len(uint_arrays), | ||
| "signed": any_neg, | ||
| "signed": bool(any_neg), | ||
| "shape": flat.shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be out_shape instead of flat.shape? Then you wouldn't need the reshape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Currently Chapel-side I have
@arkouda.instantiateAndRegister("big_int_creation_multi_limb")
proc bigIntCreationMultiLimbMsg(cmd: string,
msgArgs: borrowed MessageArgs,
st: borrowed SymTab,
type array_dtype,
param array_nd: int): MsgTuple throws
where (array_dtype == uint(64) && array_nd == 1)I'm not sure how difficult it would be to refactor the code for multiple dimensions, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, maybe later. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make an issue? Investigate making this a multidim function?
benchmarks/array_transfer.py
Outdated
| start = time.time() | ||
| aka = ak.array(npa, max_bits=max_bits, dtype=dtype) | ||
| aka = ak.array(npa, max_bits=max_bits, dtype=dtype, unsafe=True, num_bits=128, any_neg=False) | ||
| end = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any changes to benchmarks should also be made to benchmarks_v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed benchmarks_v2
benchmarks/array_transfer.py
Outdated
| npa = a.to_ndarray() | ||
| end = time.time() | ||
| to_ndarray_times.append(end - start) | ||
| start = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be something like this:
if dtype == ak.bigint.name:
ak.array(... unsafe hints ...)
else:
ak.array(npa, dtype=dtype)I think it's confusing to add num_bits field to non-bigint array calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Big improvement to (bigint) array transfer time.
Adds parameters
unsafe,num_bits, andany_negallowing the user to bypass a big loop over the entire array.unsafeis by defaultFalsewhich will slow things down, but I think things will not be terribly slow. Anyone usingbigintshould probably have an idea of what they're doing and will be able to achieve some easy performance gains.Closes #5330: Improve ak.array Bigint array transfer performance