Summary
Recent CI failures indicate dtype mismatches and unsupported dtype errors that appear to be triggered by changes in NumPy/Pandas defaults or dependency versions. Code that passed previously now fails due to stricter or different dtype inference, particularly for datetimes, strings, and pandas Index dtypes.
Observed Failures
datetime64[us] vs datetime64[ns] mismatches
- Errors like:
ValueError: dtype datetime64[us] is unsupported
ValueError: dtype str is unsupported
- Pandas test failures where index dtypes differ:
- Groupby aggregation error:
numeric_only accepts only Boolean values
Likely Cause
- CI environment pulled newer NumPy and/or Pandas versions (e.g. Pandas 3.x, as indicated by
Pandas4Warning)
- Upstream defaults or inference behavior changed
- Arkouda currently assumes narrower dtype sets or exact dtype matches
Proposed Fix
-
Normalize datetime and timedelta inputs
- Cast all
datetime64[*] → datetime64[ns]
- Cast all
timedelta64[*] → timedelta64[ns]
- Use kind-based checks instead of exact dtype equality
-
Broaden string dtype acceptance
- Accept NumPy unicode (
U), bytes (S), object-of-str, and pandas StringDtype
- Convert consistently to Arkouda
string
-
Align pandas Index construction
- Prefer
pd.Index(data) and allow pandas to infer dtype
- Avoid forcing
StringDtype unless explicitly required
-
Validate numeric_only arguments
- Ensure only
bool | None are accepted
- Normalize
numpy.bool_ to Python bool
-
Optional stability improvement
- Pin NumPy/Pandas versions in CI to avoid silent behavior changes
Expected Outcome
- Restore test stability across CI
- Make dtype handling robust to upstream NumPy/Pandas changes
- Reduce future breakage from default or inference shifts
Summary
Recent CI failures indicate dtype mismatches and unsupported dtype errors that appear to be triggered by changes in NumPy/Pandas defaults or dependency versions. Code that passed previously now fails due to stricter or different dtype inference, particularly for datetimes, strings, and pandas Index dtypes.
Observed Failures
datetime64[us]vsdatetime64[ns]mismatchesValueError: dtype datetime64[us] is unsupportedValueError: dtype str is unsupportedobjectvsStringDtypenumeric_only accepts only Boolean valuesLikely Cause
Pandas4Warning)Proposed Fix
Normalize datetime and timedelta inputs
datetime64[*]→datetime64[ns]timedelta64[*]→timedelta64[ns]Broaden string dtype acceptance
U), bytes (S), object-of-str, and pandasStringDtypestringAlign pandas Index construction
pd.Index(data)and allow pandas to infer dtypeStringDtypeunless explicitly requiredValidate
numeric_onlyargumentsbool | Noneare acceptednumpy.bool_to PythonboolOptional stability improvement
Expected Outcome