Skip to content

Add 3D mask visualization capabilities#200

Open
d33bs wants to merge 18 commits intocytomining:mainfrom
d33bs:3d-labels
Open

Add 3D mask visualization capabilities#200
d33bs wants to merge 18 commits intocytomining:mainfrom
d33bs:3d-labels

Conversation

@d33bs
Copy link
Member

@d33bs d33bs commented Mar 8, 2026

Description

This PR adds the ability for CytoDataFrame to visualize 3d masks.

Closes #185

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

  • New Features

    • Render 3D segmentation masks as configurable overlays in 3D views and static snapshots, with a UI toggle; overlay color/opacity defaults and inline payload support included. Inline size checks now account for label data.
  • Documentation

    • Example updated to show supplying 3D mask context.
    • Minor docstring typo fixed.
  • Tests

    • Expanded tests for 3D label overlays, cropping, rendering modes, toggles, and snapshot outputs.
  • Chores

    • Updated pre-commit tool versions and added a pre-build step to create output/masks; coverage badge hook set to manual.

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds discovery, preparation, and rendering of 3D segmentation/label overlays: finds segmentation files, prepares uint8 label volumes (optional bbox crop), propagates label_volume through PyVista/VTK HTML rendering and snapshots, updates example, pre-commit config, test fixtures, and adds tests with fake PyVista.

Changes

Cohort / File(s) Summary
Pre-commit Configuration
\.pre-commit-config.yaml
Bumped hook versions and added manual stage to the coverage-badge hook.
Core 3D overlay implementation
src/cytodataframe/frame.py
Added segmentation discovery and overlay helpers (_find_matching_segmentation_path, _prepare_3d_label_overlay, _get_3d_label_overlay_from_cell, _get_3d_bbox_crop_bounds), overlay rendering/UI helpers, and extended viewer/snapshot builders to accept and propagate label_volume with overlay toggle UI.
3D HTML/VTK pipeline
src/cytodataframe/volume.py
Extended build_3d_image_html_view and VTK/JS pipeline to accept, encode, and render label_volume (color/opacity attrs); adjusted inline-size checks to account for label payload.
Examples
docs/src/examples/cytodataframe_at_a_glance.py
Updated 3D example to optionally pass data_mask_context_dir so masks can be provided alongside 3D image volumes.
Test data / scripts
tests/data/CP_tutorial_3D_noise_nuclei_segmentation/..., tests/data/cytotable/.../create_image_data.py
Pipeline files updated to produce segmentation masks (16-bit) and run script ensures output/masks dir exists; minor docstring fix.
Tests
tests/test_frame.py, tests/test_volume.py
Added tests for segmentation matching, bbox cropping, uint8 label overlays, propagation into PyVista/VTK HTML snapshots, overlay toggle behavior; introduced fake PyVista fixtures to capture rendering interactions.

Sequence Diagram

sequenceDiagram
    participant User as CytoDataFrame User
    participant Frame as frame.py
    participant Finder as Segmentation Finder
    participant Prep as Overlay Prep
    participant Viewer as PyVista Viewer
    participant Volume as volume.py (VTK/JS)

    User->>Frame: request 3D cell view (row, column, shape)
    Frame->>Finder: _find_matching_segmentation_path(data_value,...)
    Finder-->>Frame: segmentation_path or None
    Frame->>Prep: _prepare_3d_label_overlay(segmentation_path, expected_shape, row)
    Prep-->>Frame: label_volume (uint8) or None
    Frame->>Viewer: _build_pyvista_viewer(volume, label_volume, ...)
    Viewer->>Volume: build_3d_image_html_view(volume, dims, ..., label_volume)
    Volume->>Volume: encode volume + label_volume (base64) + set data-label-* attrs
    Volume->>Viewer: return HTML/JS to render volume + overlay
    Viewer-->>User: rendered 3D view with optional mask overlay and toggle UI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • jenna-tomkinson

Poem

🐰
I hopped through voxels, bright and neat,
Found masks that made the layers meet,
Green outlines traced each tiny cell,
A toggle click — the view will tell,
Hooray, the 3D sight is sweet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add 3D mask visualization capabilities' directly summarizes the main change: introducing support for visualizing 3D segmentation masks in CytoDataFrame.
Linked Issues check ✅ Passed The PR implements 3D mask/outline visualization support for CytoDataFrame across frame.py, volume.py, and examples, directly addressing the requirement in issue #185.
Out of Scope Changes check ✅ Passed All changes are directly related to 3D mask visualization: core implementation (frame.py, volume.py), documentation/examples, test infrastructure, and dependency updates for tool compatibility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@d33bs d33bs marked this pull request as ready for review March 8, 2026 22:57
@d33bs d33bs requested a review from jenna-tomkinson as a code owner March 8, 2026 22:57
@d33bs d33bs requested a review from Copilot March 8, 2026 22:58
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cytodataframe/volume.py (1)

114-157: ⚠️ Potential issue | 🟠 Major

Count label_volume in the inline-size limit.

The guard on Line 115 only budgets volume_uint8.nbytes. When label_volume is present, this path embeds a second full volume, so near-threshold 3D cells can still inline far more data than max_inline_volume_bytes is supposed to allow.

Proposed fix
     volume_uint8 = np.array(volume, dtype=np.uint8, copy=True)
-    if volume_uint8.nbytes > max_inline_volume_bytes:
+    label_binary: Optional[np.ndarray] = None
+    if label_volume is not None:
+        label_arr = np.asarray(label_volume)
+        if label_arr.shape == volume_uint8.shape:
+            label_binary = np.where(label_arr > 0, 255, 0).astype(
+                np.uint8, copy=False
+            )
+
+    inline_bytes = volume_uint8.nbytes
+    if label_binary is not None:
+        inline_bytes += label_binary.nbytes
+
+    if inline_bytes > max_inline_volume_bytes:
         return build_3d_image_html_stub(
             data_value=data_value,
             candidate_path=candidate_path,
             display_options=display_options,
             message=(
                 "3D image too large for inline rendering "
-                f"({volume_uint8.nbytes} bytes > {max_inline_volume_bytes} bytes)"
+                f"({inline_bytes} bytes > {max_inline_volume_bytes} bytes)"
             ),
         )
@@
-    if label_volume is not None:
-        label_arr = np.asarray(label_volume)
-        if label_arr.shape == volume_uint8.shape:
-            label_binary = np.where(label_arr > 0, 255, 0).astype(np.uint8, copy=False)
+    if label_binary is not None:
             label_b64 = base64.b64encode(label_binary.tobytes()).decode("utf-8")
             overlay_color = display_options.get("label_overlay_color", (0, 255, 0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/volume.py` around lines 114 - 157, The inline-size check
only accounts for volume_uint8.nbytes and ignores label_volume, so when a label
is present the combined inline payload can exceed max_inline_volume_bytes;
update the logic around the size guard to also compute the label byte size
(e.g., convert label_volume -> np.asarray -> binary mask np.where(label_arr > 0,
255, 0).astype(np.uint8, copy=False) and get .nbytes) and compare
volume_uint8.nbytes + label_nbytes against max_inline_volume_bytes, returning
build_3d_image_html_stub with an updated message if the combined size is too
large; keep the existing conversion/code paths (label_arr, label_binary,
label_b64, label_color_attr, label_opacity_attr) but move or duplicate the
minimal label-size computation before the size check to enforce the budget.
🧹 Nitpick comments (4)
tests/test_volume.py (1)

71-90: Add a near-threshold label_volume case.

This covers the happy path, but the new overlay path also increases the inline payload substantially. A regression in max_inline_volume_bytes handling would still pass unless this suite includes a stub-fallback case with both volume and label_volume.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_volume.py` around lines 71 - 90, Add a second test variant in
tests/test_volume.py that builds both a volume and a label_volume sized near the
max_inline_volume_bytes threshold to exercise the inline-vs-fallback path; call
build_3d_image_html_view with both volume and label_volume (same pattern as
test_build_3d_image_html_view_includes_label_overlay_data) but set the arrays'
sizes so their combined inline payload is just below/at/just above
max_inline_volume_bytes, then assert the function correctly handles the
stub-fallback case by checking for the expected inline label overlay attributes
(data-label-volume, data-label-color, data-label-opacity) when inlined and/or
the fallback indicator (absence of data-label-volume or a reference URL) when it
falls back. Ensure the new test references build_3d_image_html_view and the
max_inline_volume_bytes limit so the regression in max_inline_volume_bytes
handling is covered.
src/cytodataframe/frame.py (3)

2527-2529: Redundant variable assignment.

display_options is already defined at line 2434 with the same expression. This inner assignment shadows the outer variable unnecessarily.

♻️ Proposed fix
                     label_grid.point_data["label_scalars"] = np.asfortranarray(
                         label_xyz
                     ).ravel(order="F")
-                    display_options = (
-                        self._custom_attrs.get("display_options", {}) or {}
-                    )
                     overlay_mode = str(
                         display_options.get("label_overlay_mode", "surface")
                     ).lower()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 2527 - 2529, The inner redundant
assignment to display_options (using self._custom_attrs.get("display_options",
{}) or {}) should be removed to avoid shadowing the outer variable already
defined earlier; locate the duplicate assignment within the same scope (the one
that reads self._custom_attrs.get("display_options", {}) or {}) and delete that
inner line so all uses reference the original display_options defined at line
~2434, leaving any subsequent logic unchanged.

3503-3601: Significant code duplication with _build_pyvista_viewer.

The label overlay rendering logic (lines 3503-3601) is nearly identical to lines 2512-2624 in _build_pyvista_viewer. Both handle shape validation, coordinate transposition, overlay mode selection, and mesh/volume addition identically.

Consider extracting a shared helper method to reduce maintenance burden and ensure consistent behavior:

♻️ Suggested approach
def _add_label_overlay_to_plotter(
    self,
    plotter: Any,
    volume: np.ndarray,
    label_volume: np.ndarray,
    spacing: Tuple[float, float, float],
    base_sample: float,
) -> None:
    """Add label overlay to a PyVista plotter instance."""
    display_options = self._custom_attrs.get("display_options", {}) or {}
    # ... shared overlay rendering logic ...

This would allow both _build_pyvista_viewer and _pyvista_volume_snapshot_html to call this helper, reducing ~100 lines of duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 3503 - 3601, The label overlay
rendering block duplicated between _pyvista_volume_snapshot_html and
_build_pyvista_viewer should be extracted into a single helper (suggest name
_add_label_overlay_to_plotter) that accepts the plotter, volume, label_volume,
spacing, base_sample and display_options (pulled from self._custom_attrs) and
implements the shared logic: shape validation, transposition, creation of
pv.ImageData, setting point_data, handling overlay_mode "surface" vs volume,
color/opacity normalization, adding meshes/volume, and logging; replace the
duplicated blocks in both _pyvista_volume_snapshot_html and
_build_pyvista_viewer with calls to this helper to centralize behavior and
reduce maintenance burden.

2264-2268: Consider using logger.debug instead of logger.info for routine operations.

The logger.info calls at lines 2264 and 2296 will produce output for every successful overlay lookup. In production with many cells, this could be verbose. Reserve logger.info for significant events and use logger.debug for tracing.

♻️ Proposed change
-        logger.info(
+        logger.debug(
             "Found 3D mask/outline for image %s at %s",
             data_value,
             segmentation_path,
         )
-            logger.info(
+            logger.debug(
                 "Prepared 3D mask/outline overlay for image %s with shape %s",
                 data_value,
                 overlay.shape,
             )

Also applies to: 2296-2300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 2264 - 2268, Replace the routine
informational logs that announce successful overlay/segmentation lookups from
logger.info to logger.debug to avoid noisy production logs: specifically change
the logger.info call that logs ("Found 3D mask/outline for image %s at %s",
data_value, segmentation_path) and the similar logger.info call around
overlay_path (the second occurrence referenced in the comment) to logger.debug
so these routine per-cell successes are only emitted at debug level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/examples/cytodataframe_at_a_glance.py`:
- Around line 357-364: The inline comment above the CytoDataFrame instantiation
is too long (triggers E501); split or shorten the comment text so no line
exceeds the linter limit—e.g., break "# read 3d images with segmentation masks
and show how the segmentation masks are also 3D." into two or more shorter
comment lines or reword it to be concise; ensure the surrounding code that
constructs CytoDataFrame (data=..., data_context_dir, data_mask_context_dir) and
the use of cp_3d_path remain unchanged.

---

Outside diff comments:
In `@src/cytodataframe/volume.py`:
- Around line 114-157: The inline-size check only accounts for
volume_uint8.nbytes and ignores label_volume, so when a label is present the
combined inline payload can exceed max_inline_volume_bytes; update the logic
around the size guard to also compute the label byte size (e.g., convert
label_volume -> np.asarray -> binary mask np.where(label_arr > 0, 255,
0).astype(np.uint8, copy=False) and get .nbytes) and compare volume_uint8.nbytes
+ label_nbytes against max_inline_volume_bytes, returning
build_3d_image_html_stub with an updated message if the combined size is too
large; keep the existing conversion/code paths (label_arr, label_binary,
label_b64, label_color_attr, label_opacity_attr) but move or duplicate the
minimal label-size computation before the size check to enforce the budget.

---

Nitpick comments:
In `@src/cytodataframe/frame.py`:
- Around line 2527-2529: The inner redundant assignment to display_options
(using self._custom_attrs.get("display_options", {}) or {}) should be removed to
avoid shadowing the outer variable already defined earlier; locate the duplicate
assignment within the same scope (the one that reads
self._custom_attrs.get("display_options", {}) or {}) and delete that inner line
so all uses reference the original display_options defined at line ~2434,
leaving any subsequent logic unchanged.
- Around line 3503-3601: The label overlay rendering block duplicated between
_pyvista_volume_snapshot_html and _build_pyvista_viewer should be extracted into
a single helper (suggest name _add_label_overlay_to_plotter) that accepts the
plotter, volume, label_volume, spacing, base_sample and display_options (pulled
from self._custom_attrs) and implements the shared logic: shape validation,
transposition, creation of pv.ImageData, setting point_data, handling
overlay_mode "surface" vs volume, color/opacity normalization, adding
meshes/volume, and logging; replace the duplicated blocks in both
_pyvista_volume_snapshot_html and _build_pyvista_viewer with calls to this
helper to centralize behavior and reduce maintenance burden.
- Around line 2264-2268: Replace the routine informational logs that announce
successful overlay/segmentation lookups from logger.info to logger.debug to
avoid noisy production logs: specifically change the logger.info call that logs
("Found 3D mask/outline for image %s at %s", data_value, segmentation_path) and
the similar logger.info call around overlay_path (the second occurrence
referenced in the comment) to logger.debug so these routine per-cell successes
are only emitted at debug level.

In `@tests/test_volume.py`:
- Around line 71-90: Add a second test variant in tests/test_volume.py that
builds both a volume and a label_volume sized near the max_inline_volume_bytes
threshold to exercise the inline-vs-fallback path; call build_3d_image_html_view
with both volume and label_volume (same pattern as
test_build_3d_image_html_view_includes_label_overlay_data) but set the arrays'
sizes so their combined inline payload is just below/at/just above
max_inline_volume_bytes, then assert the function correctly handles the
stub-fallback case by checking for the expected inline label overlay attributes
(data-label-volume, data-label-color, data-label-opacity) when inlined and/or
the fallback indicator (absence of data-label-volume or a reference URL) when it
falls back. Ensure the new test references build_3d_image_html_view and the
max_inline_volume_bytes limit so the regression in max_inline_volume_bytes
handling is covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9fddf98-8fe2-4790-b067-f52b1626c4e5

📥 Commits

Reviewing files that changed from the base of the PR and between c364801 and fdec51a.

⛔ Files ignored due to path filters (5)
  • media/coverage-badge.svg is excluded by !**/*.svg
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/output/masks/nuclei1_out_c00_dr90_imageSegmentationMask.tiff is excluded by !**/*.tiff
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/output/masks/nuclei2_out_c90_dr90_imageSegmentationMask.tiff is excluded by !**/*.tiff
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/output/masks/nuclei3_out_c00_dr10_imageSegmentationMask.tiff is excluded by !**/*.tiff
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/output/masks/nuclei4_out_c90_dr10_imageSegmentationMask.tiff is excluded by !**/*.tiff
📒 Files selected for processing (10)
  • .pre-commit-config.yaml
  • docs/src/examples/cytodataframe_at_a_glance.ipynb
  • docs/src/examples/cytodataframe_at_a_glance.py
  • src/cytodataframe/frame.py
  • src/cytodataframe/volume.py
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/3d-nuclei-profiling.cppipe
  • tests/data/CP_tutorial_3D_noise_nuclei_segmentation/run.sh
  • tests/data/cytotable/NF1_cellpainting_data_shrunken/create_image_data.py
  • tests/test_frame.py
  • tests/test_volume.py

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the ability for CytoDataFrame to visualize 3D segmentation masks/outlines as overlays on top of 3D volume images. It supports both the VTK.js inline HTML path and the PyVista interactive/snapshot paths, and it includes a refactoring of bounding-box crop logic into a reusable helper method.

Changes:

  • Adds _find_matching_segmentation_path, _prepare_3d_label_overlay, _get_3d_label_overlay_from_cell, and _get_3d_bbox_crop_bounds methods to CytoDataFrame for 3D mask lookup and rendering.
  • Extends build_3d_image_html_view and _build_vtk_js_renderer_core in volume.py to embed label overlay data and render it via VTK.js.
  • Updates the CellProfiler pipeline config and test data to produce dedicated mask output files in a masks/ subdirectory.

Reviewed changes

Copilot reviewed 9 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/cytodataframe/frame.py Core changes: new 3D label overlay methods, refactored bbox crop helper, and label overlay wired into all 3D rendering paths (VTK HTML, PyVista viewer, PyVista snapshot)
src/cytodataframe/volume.py Adds label overlay data embedding and VTK.js rendering script for the inline HTML 3D view
tests/test_frame.py New tests for label overlay lookup, bbox crop alignment, and label overlay passing to PyVista viewers/snapshots
tests/test_volume.py New test verifying label overlay data attributes in the generated HTML
tests/data/CP_tutorial_3D_noise_nuclei_segmentation/3d-nuclei-profiling.cppipe Updated CellProfiler pipeline to save masks to a dedicated masks/ folder with consistent naming
tests/data/CP_tutorial_3D_noise_nuclei_segmentation/run.sh Creates the output/masks/ folder before running the pipeline
tests/data/CP_tutorial_3D_noise_nuclei_segmentation/output/masks/nuclei2_out_c90_dr90_imageSegmentationMask.tiff New binary test mask file produced by the updated pipeline
tests/data/cytotable/NF1_cellpainting_data_shrunken/create_image_data.py Fixed typo: "puproses" → "purposes"
docs/src/examples/cytodataframe_at_a_glance.py Adds a documentation example using data_mask_context_dir with 3D data
.pre-commit-config.yaml Updates codespell and ruff versions; marks coverage badge as a manual stage
media/coverage-badge.svg Updated coverage percentage from 89.97% to 90.10%

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 1334-1339: The current loop returns the first file_pattern match
from root.rglob("*") as soon as original_pattern matches data_value, which can
return overlays from unrelated images; instead, constrain matching to files that
are tied to the current image identity. Change the search so that after
original_pattern matches data_value you derive an image-specific identifier (for
example via groups from original_pattern on data_value or the source image
Path/stem) and then search only within the same directory/parent tree (do not
use root.rglob("*") globally) or filter candidate files by that same identifier
before returning; update the loop using pattern_map, original_pattern,
data_value, file_pattern and the root.rglob call to enforce the image-level
correspondence rather than returning the first global regex hit.
- Around line 2240-2256: The path-resolution logic that normalizes a "file:"
URI, strips directories, and searches data_context_dir for matches (currently
inline around variables data_value, normalized, candidate_path, context_dir)
should be extracted into a shared helper (e.g., _resolve_volume_candidate or
similar) and used both here and from _get_3d_volume_from_cell(); implement the
helper to accept the raw value and self._custom_attrs (or context_dir) and
return a pathlib.Path candidate (handling "file:" prefix, extracting basename,
and rglob lookup in data_context_dir), then replace the inline block in this
method with a call to that helper and update _get_3d_volume_from_cell() to call
the same helper so overlays for 3D images resolve consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d71c9f38-7bbc-45bb-bfec-47e0e12cd50f

📥 Commits

Reviewing files that changed from the base of the PR and between fdec51a and 461eb7b.

📒 Files selected for processing (6)
  • docs/src/examples/cytodataframe_at_a_glance.ipynb
  • docs/src/examples/cytodataframe_at_a_glance.py
  • src/cytodataframe/frame.py
  • src/cytodataframe/volume.py
  • tests/test_frame.py
  • tests/test_volume.py

Co-Authored-By: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/cytodataframe/frame.py (1)

2270-2286: ⚠️ Potential issue | 🟠 Major

Reuse the same candidate-path resolution as _get_3d_volume_from_cell().

This lookup still only handles the raw value plus data_context_dir, so 3D cells resolved via file: URIs, Image_PathName_*, or data_image_paths can load the volume but fail to find the matching label overlay. Please route both code paths through the same path-resolution helper before calling _find_matching_segmentation_path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 2270 - 2286, The current
candidate_path resolution duplicates logic and misses the same URI/name lookups
used by _get_3d_volume_from_cell(), causing label overlays to be missed; replace
the manual normalization block that builds candidate_path (the code using
data_value, data_context_dir and rglob) with a call to the same path-resolution
helper used by _get_3d_volume_from_cell() so that file:, Image_PathName_* and
data_image_paths are all resolved the same way, then pass the resulting
candidate_path into _find_matching_segmentation_path (and any subsequent logic)
instead of the locally computed path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 1361-1367: The current matching_files comprehension uses substring
checks (any(idf in file.stem ...)) causing false positives; change it to
token/exact matching by normalizing or tokenizing the file stem and matching
whole tokens or using a word-boundary regex: e.g., split file.stem on
non-alphanumeric delimiters (or normalize both idf and stem) and test idf ∈
tokens, or use re.search with word boundaries around re.escape(idf) against
file.stem; update the condition in matching_files (where identifiers, file.stem,
file_pattern, search_root.rglob are used) to perform tokenized or whole-word
regex matching instead of substring membership.

---

Duplicate comments:
In `@src/cytodataframe/frame.py`:
- Around line 2270-2286: The current candidate_path resolution duplicates logic
and misses the same URI/name lookups used by _get_3d_volume_from_cell(), causing
label overlays to be missed; replace the manual normalization block that builds
candidate_path (the code using data_value, data_context_dir and rglob) with a
call to the same path-resolution helper used by _get_3d_volume_from_cell() so
that file:, Image_PathName_* and data_image_paths are all resolved the same way,
then pass the resulting candidate_path into _find_matching_segmentation_path
(and any subsequent logic) instead of the locally computed path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94686538-ddc3-4abd-a4a9-d7c13cec753d

📥 Commits

Reviewing files that changed from the base of the PR and between 461eb7b and ea8586c.

📒 Files selected for processing (2)
  • src/cytodataframe/frame.py
  • tests/test_frame.py

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I saw some HTML in here that I am not as familiar with, so you might consider having someone else give those small sections a review. Either way, cool work and I look forward to seeing your responses to some of my questions!

data_mask_context_dir=str(pathlib.Path(cp_3d_path) / "output/masks"),
)

cdf[["ImageNumber", "ObjectNumber", "FileName_Nuclei"]][:3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this dataset not have the Image_ prefix for the FileName and PathName columns? This would be interesting as I am assuming this dataset would come from CellProfiler. This question relates to the meeting discussion and handling columns.

Comment on lines +1320 to +1323
pattern_map: Optional regex mapping from segmentation filename
patterns to source image patterns.
file_dir: Root directory containing mask/outline files.
candidate_path: Resolved image candidate path used for stem matching.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the candidate path in this case? I tend to always us the regex method to resolve matching the cell to the mask, so I am curious how people would use it. Might consider also including examples in the docstring.

Comment on lines +1673 to +1685
segmentation_path = self._find_matching_segmentation_path(
data_value=data_value,
pattern_map=pattern_map,
file_dir=self._custom_attrs.get("data_mask_context_dir"),
candidate_path=candidate_path,
)
if segmentation_path is None:
segmentation_path = self._find_matching_segmentation_path(
data_value=data_value,
pattern_map=pattern_map,
file_dir=self._custom_attrs.get("data_outline_context_dir"),
candidate_path=candidate_path,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was a bit confusing at first to me, they almost looked the same lol Is there a way to enforce DNR here?

Comment on lines +2393 to +2398
x_min_col = _find_col("Minimum_X")
x_max_col = _find_col("Maximum_X")
y_min_col = _find_col("Minimum_Y")
y_max_col = _find_col("Maximum_Y")
z_min_col = _find_col("Minimum_Z")
z_max_col = _find_col("Maximum_Z")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this following CellProfiler convention? Do we expect any need for flexibility here?

Comment on lines +3185 to 3190
table_height = _css_size(
kwargs.pop("table_height", kwargs.pop("table_max_height", "700px")),
"700px",
)

grid = widgets.GridspecLayout(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was the max height determined here?

Comment on lines -2554 to +3194
width="auto",
width="100%",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the default change? Was it better for visualization?

Comment on lines +137 to +139
label_b64 = ""
label_color_attr = "0,1,0"
label_opacity_attr = "0.95"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend adding code comments here to make it clear what these are. (like is the color set for green by default?)

Comment on lines +294 to +300
"if(labelRawB64){"
"const labelRaw=atob(labelRawB64);"
"if(labelRaw.length===bytes.length){"
"const labelBytes=new Uint8Array(labelRaw.length);"
"for(let i=0;i<labelRaw.length;i+=1){labelBytes[i]=labelRaw.charCodeAt(i);}"
"const labelData=vtk.Common.DataModel.vtkImageData.newInstance();"
"labelData.setDimensions(dims);"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit lost here. Is this not Python? Might recommend having someone else look at this if it is HTML or something I am less familar with reviewing.

Calculate the advanced features?:Yes

RescaleIntensity:[module_num:12|svn_version:'Unknown'|variable_revision_number:3|show_window:True|notes:[]|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
RescaleIntensity:[module_num:12|svn_version:'Unknown'|variable_revision_number:3|show_window:True|notes:[]|batch_state:array([], dtype=uint8)|enabled:False|wants_pause:False]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some modules have been turned off. Could you share the reasoning for turning them off when producing the features and masks?

Comment on lines +8 to +9
# Ensure dedicated output folder for segmentation masks exists.
mkdir -p "$CPDOCKER_RUNDIR/output/masks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great call!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable 3d mask / outline visibility

3 participants