Skip to content

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722

Open
FBumann wants to merge 10 commits into
masterfrom
fix/bounds-coords-broadcast
Open

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722
FBumann wants to merge 10 commits into
masterfrom
fix/bounds-coords-broadcast

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 23, 2026

Closes #706, #709. Partially addresses #723 (helper relocated to common.py; call-site audit deferred to follow-up — see issue comment for scope).

What changes

Principle. 0.7.0 made coords the source of truth for DataArray bounds in add_variables. This PR closes the two remaining gaps so the rule holds across every bound type and dimension order:

Implementation. _validate_dataarray_bounds and the downstream as_dataarray(arr, coords) call in add_variables collapse into a single helper, as_dataarray_in_coords (in linopy/common.py, alongside as_dataarray). It converts any input (pandas with named axes via to_xarray, otherwise via as_dataarray), validates the result against coords, expands missing dims, transposes to coords order, and reconstructs the coord variables in that order. expand_dims({}) and identity transpose are metadata-only no-ops, so scalar bounds and full-dim DataArray bounds already in coords order pay no extra cost.

Pandas conversion. A new _named_pandas_to_dataarray helper handles pandas with fully named axes (including MultiIndex on either axis) by stacking all column levels into the row index in one call: arr.stack(level=list(range(arr.columns.nlevels)), future_stack=True), followed by to_xarray(). Pandas with any unnamed axis falls through to as_dataarray.

Piecewise. Same family of fix: linopy.piecewise._broadcast_points built its expand_dims map from a set, giving a hash-randomized dim order across processes. Iterates expressions and dims in declaration order now.

Supersedes

Test plan

  • test/test_variable.py::TestAddVariablesBoundsWithCoords — 57/57. Includes:
    • test_bound_broadcast_missing_dim parametrized over DataArray, Series, DataFrame, Series-multiindex, DataFrame-multicolumns, DataFrame-multiindex.
    • test_dataarray_broadcast_missing_dim_order parametrized over scalar/DA bound combinations.
    • test_dataarray_coord_reorder for the same-values-different-order reindex.
    • test_pandas_bound_with_unnamed_axis_falls_through, test_unnamed_coords_short_circuit, test_dataarray_bound_with_multiindex_coord covering edge branches in the new helpers.
  • test/test_piecewise_constraints.py::test_broadcast_points_dim_order_follows_exprs.
  • Broader sweep: test_model.py, test_linear_expression.py, test_constraint.py, test_constraints.py, test_piecewise_constraints.py, test_variable_assignment.py, test_variables.py — 784 pass, no regressions.
  • Pre-commit (ruff/format/blackdoc/codespell) clean.

🤖 Generated with Claude Code

FBumann and others added 6 commits May 24, 2026 00:38
`add_variables` had two related bugs when `lower`/`upper` were arrays:

- pandas Series/DataFrame bounds missing a dimension in `coords` had
  the missing dimension silently dropped (#709), unlike DataArray
  bounds which were already broadcast.
- DataArray bounds missing a dimension were expanded with
  `DataArray.expand_dims`, which prepends new dimensions and produces
  a `coords`-mismatched dimension order in the resulting variable
  (#706). The order depended on the type of the bounds, so scalar
  bounds worked but two array bounds missing the same dimension did
  not.

Replace `_validate_dataarray_bounds` plus the downstream
`as_dataarray(..., coords)` call with a single helper
`_as_dataarray_in_coords`. It converts any input (pandas with named
axes via `to_xarray`, otherwise via `as_dataarray`), validates the
result against `coords`, expands missing dims, transposes to coords
order, and reconstructs the coord variables in that order.
`expand_dims` and `transpose` are no-ops when the array already
matches, so scalar / full-dim DataArray bounds keep their fast path.

Also fix `linopy.piecewise._broadcast_points`, which built the
`expand_dims` map from a `set`, producing a hash-randomized dimension
order across processes. Iterate expressions and dims in declaration
order instead.

Closes #706 and #709. Supersedes #710 and #719.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restate #706/#709's fix as a single principle in the docstring,
release note, and `_as_dataarray_in_coords` helper docstring:
when `coords` is provided to `add_variables`, it is the source of
truth for dimensions, dimension order, and coordinate values, and
`lower` / `upper` are broadcast and aligned to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0.7.0 already shipped "add_variables no longer ignores coords when
lower / upper are DataArrays". Recast the new bullet as extending
that fix to the remaining gaps (pandas bounds; dim order across
bound types) so the continuity is visible from the release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng gaps"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize test_bound_broadcast_missing_dim with three additional
  cases: Series with MultiIndex(time, colour), DataFrame with
  MultiIndex columns(space, colour), and DataFrame with MultiIndex
  index(time, space). Exercises the `while DataFrame: unstack()`
  loop and the MultiIndex branch of `_named_pandas_to_dataarray`.
- Add test_dataarray_coord_reorder for the same-values-different-order
  reindex branch (previously only the unequal-values raise was
  covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Relocate `_as_dataarray_in_coords` and its helpers
(`_coords_to_dict`, `_named_pandas_to_dataarray`) from `model.py`
into `common.py`, alongside the existing `as_dataarray` they
parallel. Rename to `as_dataarray_in_coords` (no leading underscore)
since it is no longer file-local — other modules can import the
strict-coords variant when migrating call sites.

Pure relocation: no behavior change, no call-site changes beyond
`add_variables`'s import. Refs #723.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread linopy/model.py
self._check_valid_dim_names(data)

if mask is not None:
mask = as_dataarray(mask, coords=data.coords, dims=data.dims).astype(bool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to give mask the same treatment as the bounds to avoid the same pandas issue. Then mask could have the same type hint as lower/upper bounds and the user could for example pass pd.DataFrame for all 3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would! But this is somewhat breaking (we had a FutureWarning in place, but still...). I will split this into a separate PR!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed?

Comment thread linopy/model.py Outdated
FBumann and others added 2 commits May 24, 2026 13:15
…anches

Replace the unstack-while-loop / split named-check structure with a
single up-front "all axes named" check and a single
``DataFrame.stack(level=list(range(nlevels)), future_stack=True)``
call that collapses all column levels into the row MultiIndex in
one shot. Same observable behaviour, fewer moving parts, no
defensive unreachable branches.

Add tests covering the unnamed-axis fall-through path, the
empty-coords short-circuit in ``as_dataarray_in_coords``, and the
``MultiIndex``-on-a-dim ``continue`` in the validation loop.
Together with the restructure these bring the new helper code to
full patch coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pandas allows any hashable in ``pd.Index.names`` (tuples, ints,
etc.), but only strings map cleanly to xarray dim names. Reject
anything non-string up front so the pandas falls back to
``as_dataarray`` instead of producing a DataArray with an awkward
non-string dim name that downstream validation would reject with a
confusing "extra dimensions" error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add_variables: DataArray bounds with missing dimensions yield wrong dimension order

2 participants