fix: cross-platform setup and missing dependencies#53
Open
Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Open
fix: cross-platform setup and missing dependencies#53Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Sdamirsa wants to merge 4 commits intoBIT-DataLab:mainfrom
Conversation
- Rewrite setup_sam3.sh for Windows (Git Bash, WSL), macOS, and Linux with automatic pip/uv detection, path conversion, and error messages - Install SAM3 with [dev,notebooks] extras to include einops, pycocotools and other transitive deps missing from SAM3 core requirements - Patch triton import (Linux-only) to be conditional so SAM3 loads on all platforms - Add verification step at end of setup script (green/red status) - Patch SAM3 fused BFloat16 MLP kernel (addmm_act) that causes dtype mismatch on consumer GPUs (RTX 3080/4090 etc.) - Add python-multipart, modelscope, einops to requirements.txt - Add .gitattributes rule to keep .sh files with LF line endings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…RI fix
Adds an editable-vector export stage to the pipeline, broadens SAM3 prompt
coverage for scientific/medical figures, and fixes a classification bug
that was rendering medical-image detections as blank white outlines.
## What's new
### Stage 8 — Vector export (new modules)
Per-image output under `output/{image}/vectors/`:
elements/ individual editable SVGs for every detected element
rasters/ cropped transparent-background PNGs for image elements
combined/ single combined.svg (layered) and combined.pdf
manifest.json element index with bbox, score, layer, paths
New modules:
modules/svg_generator.py hybrid renderer — geometric primitives for
known shapes, Chaikin-smoothed polygons for
complex contours, base64-embedded crops for
raster elements, editable <text> for OCR
modules/pdf_combiner.py svglib/cairosvg PDF backend
modules/section_detector.py panel detection via SAM3 backgrounds + HoughLinesP
modules/vector_exporter.py Stage 8 orchestrator (BaseProcessor subclass)
CLI:
--vector-level=granular|section|component|all (default: granular)
--no-vectors skip Stage 8
### Prompt v2 — broader coverage for scientific/medical figures
Total prompts: 19 -> 78
prompts/image.py 5 -> 29 (CT/MRI/ultrasound, 3D heart/anatomy,
person/crowd icons, computer monitors,
checkerboard/grid patterns, image stacks)
prompts/shape.py 7 -> 17 (trapezoid, parallelogram, 3D cube,
isometric box, cylinder, color swatch,
small colored square, stack of rectangles)
prompts/arrow.py 3 -> 17 (thick/block/curved/looping/bidirectional/
dashed/dotted/L-shaped/skip variants)
prompts/background.py 4 -> 15 (sub-figure panel, dashed border rectangle,
legend box/panel, title bar, header strip)
Config tuning to match (config/config.yaml — gitignored):
shape.min_area: 200 -> 80 (catches 14x14 legend swatches)
shape.score_threshold: 0.5 -> 0.45
arrow.score_threshold: 0.45 -> 0.4
image.score_threshold: 0.5 -> 0.45
### Bug fix — heart/MRI rendered as blank white polygon outlines
Type classification was scattered across three files using case-sensitive
string comparisons. IMAGE_PROMPT contains mixed-case names like
"3D heart model" and "MRI image", but every comparison did
`elem.type.lower() in CasedSet`, so those specific scientific-image
prompts silently fell through and got rendered as white polygon outlines.
Across 18 figures, this dropped 40 medical detections (36 MRI + 4 heart)
to outline-only. After the fix all 40 are properly extracted as RGBA
crops and embedded as base64 <image> in their SVGs.
Fix made the prompt files the single source of truth:
modules/svg_generator.py RASTER_TYPES, GEOMETRIC_SHAPES, ARROW_TYPES
now derived from prompt files via
`_expand_forms()` helper (covers both
space-form and underscore-form normalization)
modules/icon_picture_processor.py lowercased IMAGE_PROMPT before comparison
modules/data_types.py get_layer_level() imports prompt lists;
specific prompts land in correct layer
(IMAGE/BASIC_SHAPE/ARROW/BACKGROUND)
instead of OTHER
Adding a new prompt now auto-registers for routing, layer assignment, and
raster cropping — no parallel lists to keep in sync.
## Run results on the 18-figure test set
1,071 individual element SVGs
425 raster PNGs (was 385 before fix; +40 = the heart/MRI recoveries)
18 combined SVGs (one per figure)
18 combined PDFs (one per figure, Affinity-ready)
## Known limitations & future work
Even with broader prompts and the new hierarchical layer assignment, the
pipeline still under-understands **multi-panel / schematic figures**.
Detection happens per element; the global semantics — which arrow connects
which box across panel boundaries, which legend swatch labels which plot —
is not modeled.
Two directions worth exploring:
1. Two-pass extraction with explicit panel splitting. First pass: detect
sub-figure panels and split the source image into per-panel crops.
Second pass: run the full pipeline on each crop independently. This
should help the model focus on local structure and avoid cross-panel
prompt confusion. SAM3 backgrounds + HoughLinesP already give us panel
candidates (see section_detector.py); the missing piece is the
recursive split-and-rerun loop.
2. Smart margin padding around cropped rasters. Tight bboxes sometimes
clip strokes or leave faint background ghosts. A per-type margin
heuristic (icon vs. photo vs. schematic illustration) would clean this
up, but the logic is hard to pin down — loose enough to capture the
full visual element, tight enough to avoid neighbor bleed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a [!NOTE] admonition near the top of the README documenting the remaining limitation after the v2 prompt expansion, hierarchical layer assignment, and Stage 8 vector export: the pipeline still under-understands multi-panel scientific schematics because detection is per-element while global semantics (cross-panel arrows, legend-to-plot mapping) is not modeled. Lists two roadmap directions for tackling this challenge: 1. Two-pass extraction with explicit panel splitting + recursion 2. Smart per-element-type margin padding around cropped rasters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
I did extra work in my own fork Sdamirsa@d6c445a (not requested to merge): Summary of what shipped:
Known limitations & future work documented in the commit body:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issues found during setup on Windows (RTX 3080 Ti, CUDA 12.8, Python 3.12):
setup_sam3.shfails on Windows/WSL/macOS — hardcodespip, no path conversion, no cross-platform supporteinopsimported by SAM3 at runtime but missing fromrequirements.txtpycocotoolsimported unconditionally via SAM3 training code but only listed in optional extraspython-multipartrequired by FastAPI/convertendpoint but missing fromrequirements.txttritonimported at module level inedt.pybut is Linux-only — crashes on Windows/macOSaddmm_actinperflib/fused.pycasts all tensors to BFloat16 (H100 optimization) — causes dtype mismatch RuntimeError on consumer GPUs (RTX 3080, 3090, 4090, etc.)Changes:
setup_sam3.shfor cross-platform support with pip/uv auto-detection, WSL path conversion, triton patching, and a verification step[dev,notebooks]extras to include missing transitive depspython-multipart,modelscope,einopstorequirements.txtaddmm_actBFloat16 cast to use standard float32 ops for consumer GPU compatibility.gitattributesrule*.sh text eol=lfTested: CLI and server both working end-to-end. No pipeline logic changed.