fix: cross-platform setup and missing dependencies by Sdamirsa · Pull Request #53 · BIT-DataLab/Edit-Banana

Sdamirsa · 2026-04-14T09:26:53Z

Issues found during setup on Windows (RTX 3080 Ti, CUDA 12.8, Python 3.12):

setup_sam3.sh fails on Windows/WSL/macOS — hardcodes pip, no path conversion, no cross-platform support
einops imported by SAM3 at runtime but missing from requirements.txt
pycocotools imported unconditionally via SAM3 training code but only listed in optional extras
python-multipart required by FastAPI /convert endpoint but missing from requirements.txt
triton imported at module level in edt.py but is Linux-only — crashes on Windows/macOS
addmm_act in perflib/fused.py casts all tensors to BFloat16 (H100 optimization) — causes dtype mismatch RuntimeError on consumer GPUs (RTX 3080, 3090, 4090, etc.)
Shell scripts get CRLF line endings on Windows checkout, breaking bash execution

Changes:

Rewrite setup_sam3.sh for cross-platform support with pip/uv auto-detection, WSL path conversion, triton patching, and a verification step
Install SAM3 with [dev,notebooks] extras to include missing transitive deps
Add python-multipart, modelscope, einops to requirements.txt
Patch addmm_act BFloat16 cast to use standard float32 ops for consumer GPU compatibility
Add .gitattributes rule *.sh text eol=lf

Tested: CLI and server both working end-to-end. No pipeline logic changed.

- Rewrite setup_sam3.sh for Windows (Git Bash, WSL), macOS, and Linux with automatic pip/uv detection, path conversion, and error messages - Install SAM3 with [dev,notebooks] extras to include einops, pycocotools and other transitive deps missing from SAM3 core requirements - Patch triton import (Linux-only) to be conditional so SAM3 loads on all platforms - Add verification step at end of setup script (green/red status) - Patch SAM3 fused BFloat16 MLP kernel (addmm_act) that causes dtype mismatch on consumer GPUs (RTX 3080/4090 etc.) - Add python-multipart, modelscope, einops to requirements.txt - Add .gitattributes rule to keep .sh files with LF line endings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…RI fix Adds an editable-vector export stage to the pipeline, broadens SAM3 prompt coverage for scientific/medical figures, and fixes a classification bug that was rendering medical-image detections as blank white outlines. ## What's new ### Stage 8 — Vector export (new modules) Per-image output under `output/{image}/vectors/`: elements/ individual editable SVGs for every detected element rasters/ cropped transparent-background PNGs for image elements combined/ single combined.svg (layered) and combined.pdf manifest.json element index with bbox, score, layer, paths New modules: modules/svg_generator.py hybrid renderer — geometric primitives for known shapes, Chaikin-smoothed polygons for complex contours, base64-embedded crops for raster elements, editable <text> for OCR modules/pdf_combiner.py svglib/cairosvg PDF backend modules/section_detector.py panel detection via SAM3 backgrounds + HoughLinesP modules/vector_exporter.py Stage 8 orchestrator (BaseProcessor subclass) CLI: --vector-level=granular|section|component|all (default: granular) --no-vectors skip Stage 8 ### Prompt v2 — broader coverage for scientific/medical figures Total prompts: 19 -> 78 prompts/image.py 5 -> 29 (CT/MRI/ultrasound, 3D heart/anatomy, person/crowd icons, computer monitors, checkerboard/grid patterns, image stacks) prompts/shape.py 7 -> 17 (trapezoid, parallelogram, 3D cube, isometric box, cylinder, color swatch, small colored square, stack of rectangles) prompts/arrow.py 3 -> 17 (thick/block/curved/looping/bidirectional/ dashed/dotted/L-shaped/skip variants) prompts/background.py 4 -> 15 (sub-figure panel, dashed border rectangle, legend box/panel, title bar, header strip) Config tuning to match (config/config.yaml — gitignored): shape.min_area: 200 -> 80 (catches 14x14 legend swatches) shape.score_threshold: 0.5 -> 0.45 arrow.score_threshold: 0.45 -> 0.4 image.score_threshold: 0.5 -> 0.45 ### Bug fix — heart/MRI rendered as blank white polygon outlines Type classification was scattered across three files using case-sensitive string comparisons. IMAGE_PROMPT contains mixed-case names like "3D heart model" and "MRI image", but every comparison did `elem.type.lower() in CasedSet`, so those specific scientific-image prompts silently fell through and got rendered as white polygon outlines. Across 18 figures, this dropped 40 medical detections (36 MRI + 4 heart) to outline-only. After the fix all 40 are properly extracted as RGBA crops and embedded as base64 <image> in their SVGs. Fix made the prompt files the single source of truth: modules/svg_generator.py RASTER_TYPES, GEOMETRIC_SHAPES, ARROW_TYPES now derived from prompt files via `_expand_forms()` helper (covers both space-form and underscore-form normalization) modules/icon_picture_processor.py lowercased IMAGE_PROMPT before comparison modules/data_types.py get_layer_level() imports prompt lists; specific prompts land in correct layer (IMAGE/BASIC_SHAPE/ARROW/BACKGROUND) instead of OTHER Adding a new prompt now auto-registers for routing, layer assignment, and raster cropping — no parallel lists to keep in sync. ## Run results on the 18-figure test set 1,071 individual element SVGs 425 raster PNGs (was 385 before fix; +40 = the heart/MRI recoveries) 18 combined SVGs (one per figure) 18 combined PDFs (one per figure, Affinity-ready) ## Known limitations & future work Even with broader prompts and the new hierarchical layer assignment, the pipeline still under-understands **multi-panel / schematic figures**. Detection happens per element; the global semantics — which arrow connects which box across panel boundaries, which legend swatch labels which plot — is not modeled. Two directions worth exploring: 1. Two-pass extraction with explicit panel splitting. First pass: detect sub-figure panels and split the source image into per-panel crops. Second pass: run the full pipeline on each crop independently. This should help the model focus on local structure and avoid cross-panel prompt confusion. SAM3 backgrounds + HoughLinesP already give us panel candidates (see section_detector.py); the missing piece is the recursive split-and-rerun loop. 2. Smart margin padding around cropped rasters. Tight bboxes sometimes clip strokes or leave faint background ghosts. A per-type margin heuristic (icon vs. photo vs. schematic illustration) would clean this up, but the logic is hard to pin down — loose enough to capture the full visual element, tight enough to avoid neighbor bleed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a [!NOTE] admonition near the top of the README documenting the remaining limitation after the v2 prompt expansion, hierarchical layer assignment, and Stage 8 vector export: the pipeline still under-understands multi-panel scientific schematics because detection is per-element while global semantics (cross-panel arrows, legend-to-plot mapping) is not modeled. Lists two roadmap directions for tackling this challenge: 1. Two-pass extraction with explicit panel splitting + recursion 2. Smart per-element-type margin padding around cropped rasters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sdamirsa · 2026-04-15T18:26:46Z

I did extra work in my own fork Sdamirsa@d6c445a (not requested to merge):

Summary of what shipped:

Stage 8 — Vector export pipeline (4 new modules: svg_generator, pdf_combiner, section_detector, vector_exporter)
Prompt v2 — 19 → 78 prompts across image/shape/arrow/background, plus config tuning for legend swatches and small/curved variants
Heart/MRI rendering fix (as an example) — three case-sensitivity bugs collapsed into a single source-of-truth design where prompt files drive type classification

Known limitations & future work documented in the commit body:

Two-pass split-and-rerun for multi-panel figures — first detect sub-figure panels (already have candidates from section_detector.py), then recursively run the full pipeline per panel. The missing piece is the orchestration loop.
Smart per-type margin padding — tight bboxes clip strokes; loose ones bleed neighbors. Logic is hard to pin down and likely needs a per-element-type heuristic (icon vs. photo vs. schematic illustration).
View the commit: Sdamirsa@d6c445a

Sdamirsa and others added 4 commits April 14, 2026 12:20

Merge branch 'BIT-DataLab:main' into main

85eeb94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cross-platform setup and missing dependencies#53

fix: cross-platform setup and missing dependencies#53
Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Sdamirsa:main

Sdamirsa commented Apr 14, 2026

Uh oh!

Sdamirsa commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sdamirsa commented Apr 14, 2026

Uh oh!

Sdamirsa commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant