Reduce dask graph size for large raster reprojection by brendancol · Pull Request #1065 · xarray-contrib/xarray-spatial

brendancol · 2026-03-24T02:08:47Z

Summary

For a 30TB raster with millions of chunks, the previous _reproject_dask and _merge_dask functions created one dask.delayed + one da.from_delayed node per output chunk, then stitched them with da.block. The graph metadata alone could exceed available memory.

This PR makes three changes:

Replace delayed/from_delayed/block with da.map_blocks over a template array. This produces a single blockwise layer in the HighLevelGraph with O(1) metadata, regardless of chunk count
Add empty-chunk skipping: precompute the source footprint in target CRS and skip chunks that don't overlap, avoiding pyproj init and source data fetching for nodata-only chunks
Remove the 1GB graph-size streaming threshold since graph metadata is now constant-size

Test plan

6 new tests in TestDaskGraphOptimization covering graph structure, numpy/dask parity, empty-chunk skipping, and helper functions
Full test_reproject.py suite passes (85/85)

Replace the O(N) delayed/from_delayed/block pattern in _reproject_dask and _merge_dask with da.map_blocks over a template array. This produces a single blockwise layer in the HighLevelGraph with O(1) metadata, so graph construction no longer scales with chunk count. Also adds empty-chunk skipping: precompute the source footprint in target CRS and skip chunks that fall outside it entirely. This avoids pyproj initialization and source data fetching for chunks that would just be filled with nodata. The streaming fallback threshold (previously 1GB graph metadata) is removed since graph metadata is now constant-size. Large in-memory arrays always go through dask when available.

da.map_blocks scans kwargs for dask Array instances and adds them as whole-array positional dependencies. This meant every output block depended on the full source array, causing MemoryError on distributed schedulers when the source exceeded the worker memory limit. Fix: bind the source dask array (and all other params) to the adapter function via functools.partial before passing to map_blocks. Dask doesn't scan partial internals for array dependencies, so each block task only depends on the template block, not the entire source. The chunk function still slices and computes just the needed source window at execution time.

The test creates many concurrent dask chunks that all call pyproj.CRS.__init__ simultaneously via the threaded scheduler. PROJ's C library is not fully thread-safe on macOS, causing SIGABRT. Use dask synchronous scheduler for this test since we're verifying empty-chunk correctness, not concurrent execution.

github-actions bot added the performance PR touches performance-sensitive code label Mar 24, 2026

brendancol added 2 commits March 23, 2026 19:20

brendancol mentioned this pull request Mar 24, 2026

Lightweight CRS parser to remove hard pyproj dependency #1057

Closed

brendancol merged commit 7381cc2 into master Mar 24, 2026
11 checks passed

brendancol mentioned this pull request Mar 24, 2026

Add dask example for reproject #1066

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce dask graph size for large raster reprojection#1065

Reduce dask graph size for large raster reprojection#1065
brendancol merged 3 commits intomasterfrom
optimize/dask-graph-reduction

brendancol commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Mar 24, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant