Skip to content

Conversation

@1RyanK
Copy link
Contributor

@1RyanK 1RyanK commented Jan 23, 2026

Big improvement to (bigint) array transfer time.
Adds parameters unsafe, num_bits, and any_neg allowing the user to bypass a big loop over the entire array. unsafe is by default False which will slow things down, but I think things will not be terribly slow. Anyone using bigint should probably have an idea of what they're doing and will be able to achieve some easy performance gains.

Closes #5330: Improve ak.array Bigint array transfer performance

@1RyanK 1RyanK force-pushed the 5330-Improve_ak.array_Bigint_array_transfer_performance branch from 70a8d64 to 6581ba0 Compare January 23, 2026 15:46
@1RyanK 1RyanK changed the title Closes 5330: Improve ak.array Bigint array transfer performance Closes #5330: Improve ak.array Bigint array transfer performance Jan 23, 2026
@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@9b71949). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5334   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         5           
  Lines           ?       109           
  Branches        ?         0           
========================================
  Hits            ?       109           
  Misses          ?         0           
  Partials        ?         0           
Flag Coverage Δ
python-coverage 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@1RyanK 1RyanK marked this pull request as ready for review January 23, 2026 17:20
dtype: Union[np.dtype, type, str, None] = None,
copy: bool = False,
max_bits: int = -1,
*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the comment inside the signature. Can it be moved lower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look better? Or should I just move it into notes?

Copy link
Contributor

@ajpotts ajpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

# Match previous behavior: object array containing floats becomes float64
# and uses the single-limb float path above.
a = a.astype(np.float64, copy=False)
flat = a.ravel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like flat is defined and not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

flat = a.ravel()
if not np.any(a):
return zeros(size=a.shape, dtype=bigint, max_bits=max_bits)
ak_a = array(a.astype(np.float64, copy=False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a.astype(np.float64, copy=False) is called twice, which is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ak_a = array(a)

"num_arrays": len(uint_arrays),
"signed": any_neg,
"signed": bool(any_neg),
"shape": flat.shape,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be out_shape instead of flat.shape? Then you wouldn't need the reshape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Currently Chapel-side I have

    @arkouda.instantiateAndRegister("big_int_creation_multi_limb")
    proc bigIntCreationMultiLimbMsg(cmd: string,
                                    msgArgs: borrowed MessageArgs,
                                    st: borrowed SymTab,
                                    type array_dtype,
                                    param array_nd: int): MsgTuple throws
    where (array_dtype == uint(64) && array_nd == 1)

I'm not sure how difficult it would be to refactor the code for multiple dimensions, either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe later. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make an issue? Investigate making this a multidim function?

start = time.time()
aka = ak.array(npa, max_bits=max_bits, dtype=dtype)
aka = ak.array(npa, max_bits=max_bits, dtype=dtype, unsafe=True, num_bits=128, any_neg=False)
end = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any changes to benchmarks should also be made to benchmarks_v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed benchmarks_v2

npa = a.to_ndarray()
end = time.time()
to_ndarray_times.append(end - start)
start = time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be something like this:

if dtype == ak.bigint.name:
    ak.array(... unsafe hints ...)
else:
    ak.array(npa, dtype=dtype)

I think it's confusing to add num_bits field to non-bigint array calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve ak.array Bigint array transfer performance

2 participants