Add tiling QC metric for tile-boundary segmentation artifacts#1157
Add tiling QC metric for tile-boundary segmentation artifacts#1157
Conversation
Cells segmented in tiles get cut at tile borders, producing fragments with artificially straight edges. This adds: - `sq.experimental.tl.calculate_tiling_qc`: per-cell scoring via collinearity-based straight-edge detection (max_straight_edge_ratio, cardinal_alignment_score, cut_score). Scores stored in .obs of a QC AnnData table linked to the labels element via spatialdata_attrs. Algorithm parameters recorded in .uns["tiling_qc"]. - `sq.experimental.pl.tiling_qc`: diagnostic plot via spatialdata-plot (renders labels coloured by score; tile grid emerges from the data). - Cell-aware tiling infrastructure (_tiling.py) for scalable labels-only tile extraction without materialising full arrays. - Test fixture with 400x400 dask-backed ellipsoid cells cut by a 3x3 tile grid, with ground-truth cut/intact classification. - 35 tests (unit, integration, visual regression). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Bump fixture from 40 cells on 400x400 to 120 cells on 600x600 for more visible tile-grid pattern in diagnostic plots - Pin spatialdata-plot>=0.3.3 for correct continuous color rendering - Regenerate visual reference images - Use _IMAGE_SIZE constant in centroid bounds test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1157 +/- ##
==========================================
- Coverage 73.56% 72.57% -0.99%
==========================================
Files 44 47 +3
Lines 6929 7359 +430
Branches 1174 1246 +72
==========================================
+ Hits 5097 5341 +244
- Misses 1347 1507 +160
- Partials 485 511 +26
🚀 New features to boost your workflow:
|
- JIT-compile the two-pointer collinearity scan with @njit for ~10-50x speedup on the per-cell hot path - Cap contour points at 500 via arc-length resampling to bound O(n²) - Handle contour closure: scan 3 rotations so straight segments crossing the start/end junction are not split - Vectorise _resample_contour with np.searchsorted (no Python loops) - Replace _zero_non_owned loop with single np.isin pass - Add tqdm progress bar that tracks cells (not tiles), updates on completion for correct parallel reporting - Extract _SCORE_COLUMNS / _NAN_SCORES constants to deduplicate - Precompute segment lengths once across rotations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Drop 29 tests that over-tested private internals. Keep 4 behavioural tests (output structure, metric discrimination, tiling invariant, error handling) and 3 visual regression tests (one per score column). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Replace joblib parallelisation with dask.delayed + dask.compute for native integration with dask-backed zarr data. Tiles are scheduled as delayed tasks; the dask scheduler handles chunk caching and worker management. - Add n_jobs parameter (default -1 = all CPUs) using a threaded scheduler, and an optional dask.distributed.Client parameter for cluster execution. Warn via logger when both are specified. - Add affinity-aware cpu_count() to squidpy/_utils.py that respects cgroup limits (SLURM, Docker, taskset) via os.sched_getaffinity, replacing multiprocessing.cpu_count throughout the codebase. - Fix NameError in pl.tiling_qc (**kwargs referenced after removal), keep spatialdata_plot import lazy, remove unused typing imports. - Replace assert with raise ValueError in verify_coverage. - Add nogil=True to numba collinearity scan for thread parallelism. - Use public API (sq.experimental.tl/pl) in tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compresses low scores toward zero so tile-boundary artifacts are more visually prominent without changing the stored metric values. Users can pass norm=Normalize() for a linear scale. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ier, nhood_outlier_fraction) Two-stage spatial-context pipeline after per-tile scoring: - smoothed_cut_score: cut_score x mean(k=10 neighbor cut_scores) - is_outlier: MAD-based threshold on smoothed scores (nmads param, default 3) - nhood_outlier_fraction: fraction of k-neighbors that are outliers Also: update plot defaults to nhood_outlier_fraction/RdYlGn_r, add clean-dataset and few-cells edge case tests, generate visual references for new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
…cells) Replace rejection sampling with grid-based placement: deterministic, no collision checking needed, 5x more cells at the same memory footprint. Regenerate all visual references for the denser fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y plot titles - is_outlier now requires both per-cell cut_score AND spatial smoothed score to exceed their respective MAD thresholds (AND when both enabled) - Separate parameters: outlier_use_cut, outlier_use_smoothed, nmads_cut (default 1.5), nmads_smoothed (default 3) - Validation: error on both gates disabled or non-positive nmads - Plot titles mapped to human-friendly names per score column - Regenerate visual references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| specs: list[TileSpec] = [] | ||
| for row in range(n_rows): | ||
| for col in range(n_cols): |
There was a problem hiding this comment.
Why not loop over the element you are going to operate on?
for (row, col), owned in tile_to_cells.items():
by0 = row * tile_size
bx0 = col * tile_size
by1 = min(by0 + tile_size, H)
bx1 = min(bx0 + tile_size, W)
cy0 = max(by0 - margin, 0)
cx0 = max(bx0 - margin, 0)
cy1 = min(by1 + margin, H)
cx1 = min(bx1 + margin, W)
specs.append(
TileSpec(
base=(by0, bx0, by1, bx1),
crop=(cy0, cx0, cy1, cx1),
owned_ids=frozenset(owned),
)
)
| raise RuntimeError(f"Duplicate cell IDs across tiles - tile ownership may be broken. Duplicates: {dups}") | ||
|
|
||
| # --- Validation --- | ||
| if not outlier_use_cut and not outlier_use_smoothed: |
There was a problem hiding this comment.
can we move the validations above before doing computations?
|
|
||
| # --- Spatial context post-processing --- | ||
| n_cells = len(combined) | ||
| k = 10 |
There was a problem hiding this comment.
k = 10 because we're calculating the local reference context for a given cell here. In a perfect grid a cell would have 8 neighbours, but biology is fuzzy, So I gave it a little wiggle room. Less and we might not capture all cut cells around it, more and we'd be starting to waste compute
There was a problem hiding this comment.
Can we have them at the top of the function then and maybe not call it k maybe something that suggests its a magic number that can be changed from there?
There was a problem hiding this comment.
and also document the reasoning for hardcoding them, it is fine to hard-code at this stage when done intentionally I guess
There was a problem hiding this comment.
I mean not only for this but as hardcoded values being on top of the function to make it more explicit.
There was a problem hiding this comment.
I wouldn't move it far away from this function because it's really only used here and nowehre else
| if mad_c < 1e-12: | ||
| is_outlier[:] = False | ||
| else: | ||
| is_outlier &= cut_scores >= median_c + nmads_cut * mad_c * 1.4826 |
There was a problem hiding this comment.
what's 1.4826? why magic numbers in general?
There was a problem hiding this comment.
That "magic number" is basically the conversion factor from MAD to standard deviation. You can use it like sd ~ 1.4826 x MAD, cheap and fast approxipation of the real thing
Summary
sq.experimental.tl.calculate_tiling_qc— per-cell scoring that detects cells artificially cut by tile boundaries during segmentation, using collinearity-based straight-edge detection on contourssq.experimental.pl.tiling_qc— diagnostic plot via spatialdata-plot where the tile grid emerges from high-scoring cells without requiring tile-border metadata_tiling.py) for scalable labels-only tile extraction — will be shared with [EXPERIMENTAL]: Integrate cp-measure #982 when merged.obsof a QC AnnData table ({labels_key}_qc) with properspatialdata_attrslinking and algorithm params in.uns["tiling_qc"]Metrics
max_straight_edge_ratiocardinal_alignment_score