One-click conversion of static diagrams (flowcharts, architecture diagrams, technical schematics) into editable DrawIO (mxGraph) XML files.
Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
👆 Click above or `https://editbanana.anxin6.cn/` to try Edit Banana online! Upload an image, get editable DrawIO XML in seconds.
To intuitively demonstrate the high-fidelity conversion effect, the following provides a one-to-one comparison between 3 groups of "original static images" and "DrawIO editable reconstruction results". All elements can be individually dragged, styled, and modified.
✨ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)
- Accurate text recognition, supporting direct subsequent editing and format adjustment
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization
- Advanced Segmentation: Uses SAM 3 (Segment Anything Model 3) for state-of-the-art segmentation of diagram elements (shapes, arrows, icons).
-
Fixed 4-Round VLM Scanning: A structured, iterative extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V) ensuring no element is left behind:
- Initial Generic Extraction: Captures standard shapes and icons.
- Single Word Scan: VLM scans blank areas for single objects.
- Two-Word Scan: Refines extraction for specific attributes.
- Phrase Scan: Captures complex descriptions or grouped objects.
-
High-Quality OCR:
- Azure Document Intelligence for precise text localization.
- Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
-
Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX (
$\int f(x) dx$ ). - Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
- Smart Background Removal: Integrated RMBG-2.0 model to automatically remove backgrounds from icons, pictures, and arrows, ensuring they layer correctly in DrawIO.
- Arrow Handling: Arrows are extracted as transparent images (rather than complex vector paths) to guarantee visual fidelity, handling dashed lines, curves, and complex routing without error.
-
Vector Shape Recovery: Standard shapes are converted to native DrawIO vector shapes with accurate fill and stroke colors.
- Supported Shapes: Rectangle, Rounded Rectangle, Diamond (Decision), Ellipse (Start/End), Cylinder (Database), Cloud, Hexagon, Triangle, Parallelogram, Actor, Title Bar, Text Bubble, Section Panel.
-
User System:
- Registration: New users receive 30 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
- Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.
- Input: Image (PNG/JPG).
- Segmentation (SAM3):
- Initial pass with standard prompts (rectangle, arrow, icon).
- Iterative loop: Analyze unrecognized regions -> Ask MLLM for visual prompts -> Re-run SAM3 mask decoder.
- Element Processing:
- Vector Shapes: Color extraction (Fill/Stroke) + Geometry mapping.
- Image Elements (Icons/Arrows): Crop -> Padding -> Mask Filtering -> RMBG-2.0 Background Removal -> Base64 Encoding.
- Text Extraction (Parallel):
- Azure OCR detects text bounding boxes.
- High-res crops of text regions are sent to Mistral/LLM.
- Latex conversion for formulas.
- XML Generation:
- Merges spatial data from SAM3 and Text OCR.
- Applies Z-Index sorting (Layers).
- Generates
.drawio.xmlfile.
sam3_workflow/
├── config/ # Configuration files
├── flowchart_text/ # OCR & Text Extraction Module
│ ├── src/ # OCR Source Code (Azure, Mistral, Alignment)
│ └── main.py # OCR Entry point
├── frontend/ # React Web Application
├── input/ # [Manual] Input images directory
├── models/ # [Manual] Model weights (RMBG, SAM3)
│ └── rmbg/ # [Manual] RMBG-2.0
├── output/ # [Manual] Results directory
├── sam3/ # SAM3 Model Library
├── scripts/ # Utility Scripts
│ └── merge_xml.py # XML Merging & Orchestration
├── main.py # CLI Entry point (Modular Pipeline)
├── server_pa.py # FastAPI Backend Server (Service-based)
└── requirements.txt # Python dependencies
Follow these steps to set up the project locally.
- Python 3.10+
- Node.js & npm (for the frontend)
- CUDA-capable GPU (Highly recommended)
git clone https://github.com/XiangjianYi/Image2DrawIO.git
cd Image2DrawIOAfter cloning, you must manually create the following resource directories (ignored by Git):
# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output
# Create model directories
mkdir -p models/rmbgDownload the required models and place them in the correct paths:
| Model | Download | Target Path |
|---|---|---|
| RMBG-2.0 | RMBG-2.0 | models/rmbg/model.onnx |
| SAM 3 | https://modelscope.cn/models/facebook/sam3 | models/sam3.pt (or as configured) |
Note: For SAM 3 (or the specific segmentation checkpoint used), place the
.ptfile inmodels/and updateconfig.yaml.
Backend:
pip install -r requirements.txtFrontend:
cd frontend
npm install
cd ..- Config File: Copy the example config.
cp config/config.yaml.example config/config.yaml
- Environment Variables: Create a
.envfile in the root directory.AZURE_ENDPOINT=your_azure_endpoint AZURE_API_KEY=your_azure_key # Add other keys as needed
Start the Backend:
python server_pa.py
# Server runs at http://localhost:8000Start the Frontend:
cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173Open your browser, upload an image, and view the result in the embedded DrawIO editor.
To process a single image:
python main.py -i input/test_diagram.pngThe output XML will be saved in the output/ directory.
Customize the pipeline behavior in config/config.yaml:
- sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
- paths: Set input/output directories.
- dominant_color: Fine-tune color extraction sensitivity.
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | ✅ Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | Automatically associate arrows with target shapes | |
| DrawIO Template Adaptation | 📍 Planned | Support custom template import |
| Batch Export Optimization | 📍 Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | 📍 Planned | Support local VLM deployment, independent of APIs |
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
💡 If the QR code has expired, please submit an Issue to request an updated one.
Thanks to all developers who have contributed to the project and promoted its iteration!
| Name/ID | |
|---|---|
| Chai Chengliang | ccl@bit.edu.cn |
| Zhang Chi | zc315@bit.edu.cn |
| Deng Qiyan | |
| Rao Sijing | |
| Yi Xiangjian | |
| Li Jianhui | |
| Shen Chaoyuan | |
| Zhang Junkai | |
| Han Junyi | |
| You Zirui | |
| Xu Haochen | |
| Yang Haotian | |
| An Minghao | |
| Yu Mingjie |
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
🌟 If this project helps you, please star it to show your support!
(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)








