Skip to content

fix: cross-platform setup and missing dependencies#53

Open
Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Sdamirsa:main
Open

fix: cross-platform setup and missing dependencies#53
Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Sdamirsa:main

Conversation

@Sdamirsa
Copy link
Copy Markdown

Issues found during setup on Windows (RTX 3080 Ti, CUDA 12.8, Python 3.12):

  • setup_sam3.sh fails on Windows/WSL/macOS — hardcodes pip, no path conversion, no cross-platform support
  • einops imported by SAM3 at runtime but missing from requirements.txt
  • pycocotools imported unconditionally via SAM3 training code but only listed in optional extras
  • python-multipart required by FastAPI /convert endpoint but missing from requirements.txt
  • triton imported at module level in edt.py but is Linux-only — crashes on Windows/macOS
  • addmm_act in perflib/fused.py casts all tensors to BFloat16 (H100 optimization) — causes dtype mismatch RuntimeError on consumer GPUs (RTX 3080, 3090, 4090, etc.)
  • Shell scripts get CRLF line endings on Windows checkout, breaking bash execution

Changes:

  • Rewrite setup_sam3.sh for cross-platform support with pip/uv auto-detection, WSL path conversion, triton patching, and a verification step
  • Install SAM3 with [dev,notebooks] extras to include missing transitive deps
  • Add python-multipart, modelscope, einops to requirements.txt
  • Patch addmm_act BFloat16 cast to use standard float32 ops for consumer GPU compatibility
  • Add .gitattributes rule *.sh text eol=lf

Tested: CLI and server both working end-to-end. No pipeline logic changed.

Sdamirsa and others added 4 commits April 14, 2026 12:20
- Rewrite setup_sam3.sh for Windows (Git Bash, WSL), macOS, and Linux
  with automatic pip/uv detection, path conversion, and error messages
- Install SAM3 with [dev,notebooks] extras to include einops, pycocotools
  and other transitive deps missing from SAM3 core requirements
- Patch triton import (Linux-only) to be conditional so SAM3 loads on
  all platforms
- Add verification step at end of setup script (green/red status)
- Patch SAM3 fused BFloat16 MLP kernel (addmm_act) that causes dtype
  mismatch on consumer GPUs (RTX 3080/4090 etc.)
- Add python-multipart, modelscope, einops to requirements.txt
- Add .gitattributes rule to keep .sh files with LF line endings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…RI fix

Adds an editable-vector export stage to the pipeline, broadens SAM3 prompt
coverage for scientific/medical figures, and fixes a classification bug
that was rendering medical-image detections as blank white outlines.

## What's new

### Stage 8 — Vector export (new modules)
Per-image output under `output/{image}/vectors/`:
  elements/      individual editable SVGs for every detected element
  rasters/       cropped transparent-background PNGs for image elements
  combined/      single combined.svg (layered) and combined.pdf
  manifest.json  element index with bbox, score, layer, paths

New modules:
  modules/svg_generator.py   hybrid renderer — geometric primitives for
                             known shapes, Chaikin-smoothed polygons for
                             complex contours, base64-embedded crops for
                             raster elements, editable <text> for OCR
  modules/pdf_combiner.py    svglib/cairosvg PDF backend
  modules/section_detector.py panel detection via SAM3 backgrounds + HoughLinesP
  modules/vector_exporter.py Stage 8 orchestrator (BaseProcessor subclass)

CLI:
  --vector-level=granular|section|component|all   (default: granular)
  --no-vectors                                     skip Stage 8

### Prompt v2 — broader coverage for scientific/medical figures
Total prompts: 19 -> 78
  prompts/image.py       5 -> 29  (CT/MRI/ultrasound, 3D heart/anatomy,
                                    person/crowd icons, computer monitors,
                                    checkerboard/grid patterns, image stacks)
  prompts/shape.py       7 -> 17  (trapezoid, parallelogram, 3D cube,
                                    isometric box, cylinder, color swatch,
                                    small colored square, stack of rectangles)
  prompts/arrow.py       3 -> 17  (thick/block/curved/looping/bidirectional/
                                    dashed/dotted/L-shaped/skip variants)
  prompts/background.py  4 -> 15  (sub-figure panel, dashed border rectangle,
                                    legend box/panel, title bar, header strip)

Config tuning to match (config/config.yaml — gitignored):
  shape.min_area:        200 -> 80    (catches 14x14 legend swatches)
  shape.score_threshold: 0.5 -> 0.45
  arrow.score_threshold: 0.45 -> 0.4
  image.score_threshold: 0.5 -> 0.45

### Bug fix — heart/MRI rendered as blank white polygon outlines
Type classification was scattered across three files using case-sensitive
string comparisons. IMAGE_PROMPT contains mixed-case names like
"3D heart model" and "MRI image", but every comparison did
`elem.type.lower() in CasedSet`, so those specific scientific-image
prompts silently fell through and got rendered as white polygon outlines.

Across 18 figures, this dropped 40 medical detections (36 MRI + 4 heart)
to outline-only. After the fix all 40 are properly extracted as RGBA
crops and embedded as base64 <image> in their SVGs.

Fix made the prompt files the single source of truth:
  modules/svg_generator.py        RASTER_TYPES, GEOMETRIC_SHAPES, ARROW_TYPES
                                  now derived from prompt files via
                                  `_expand_forms()` helper (covers both
                                  space-form and underscore-form normalization)
  modules/icon_picture_processor.py  lowercased IMAGE_PROMPT before comparison
  modules/data_types.py             get_layer_level() imports prompt lists;
                                    specific prompts land in correct layer
                                    (IMAGE/BASIC_SHAPE/ARROW/BACKGROUND)
                                    instead of OTHER

Adding a new prompt now auto-registers for routing, layer assignment, and
raster cropping — no parallel lists to keep in sync.

## Run results on the 18-figure test set
  1,071 individual element SVGs
    425 raster PNGs (was 385 before fix; +40 = the heart/MRI recoveries)
     18 combined SVGs (one per figure)
     18 combined PDFs (one per figure, Affinity-ready)

## Known limitations & future work

Even with broader prompts and the new hierarchical layer assignment, the
pipeline still under-understands **multi-panel / schematic figures**.
Detection happens per element; the global semantics — which arrow connects
which box across panel boundaries, which legend swatch labels which plot —
is not modeled.

Two directions worth exploring:

1. Two-pass extraction with explicit panel splitting. First pass: detect
   sub-figure panels and split the source image into per-panel crops.
   Second pass: run the full pipeline on each crop independently. This
   should help the model focus on local structure and avoid cross-panel
   prompt confusion. SAM3 backgrounds + HoughLinesP already give us panel
   candidates (see section_detector.py); the missing piece is the
   recursive split-and-rerun loop.

2. Smart margin padding around cropped rasters. Tight bboxes sometimes
   clip strokes or leave faint background ghosts. A per-type margin
   heuristic (icon vs. photo vs. schematic illustration) would clean this
   up, but the logic is hard to pin down — loose enough to capture the
   full visual element, tight enough to avoid neighbor bleed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a [!NOTE] admonition near the top of the README documenting the
remaining limitation after the v2 prompt expansion, hierarchical layer
assignment, and Stage 8 vector export: the pipeline still under-understands
multi-panel scientific schematics because detection is per-element while
global semantics (cross-panel arrows, legend-to-plot mapping) is not modeled.

Lists two roadmap directions for tackling this challenge:
  1. Two-pass extraction with explicit panel splitting + recursion
  2. Smart per-element-type margin padding around cropped rasters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Sdamirsa
Copy link
Copy Markdown
Author

I did extra work in my own fork Sdamirsa@d6c445a (not requested to merge):

Summary of what shipped:

  • Stage 8 — Vector export pipeline (4 new modules: svg_generator, pdf_combiner, section_detector, vector_exporter)
  • Prompt v2 — 19 → 78 prompts across image/shape/arrow/background, plus config tuning for legend swatches and small/curved variants
  • Heart/MRI rendering fix (as an example) — three case-sensitivity bugs collapsed into a single source-of-truth design where prompt files drive type classification

Known limitations & future work documented in the commit body:

  1. Two-pass split-and-rerun for multi-panel figures — first detect sub-figure panels (already have candidates from section_detector.py), then recursively run the full pipeline per panel. The missing piece is the orchestration loop.
  2. Smart per-type margin padding — tight bboxes clip strokes; loose ones bleed neighbors. Logic is hard to pin down and likely needs a per-element-type heuristic (icon vs. photo vs. schematic illustration).
    View the commit: Sdamirsa@d6c445a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant