Skip to content

[AI] GPU ORT install: manifest-driven UI button, scripts, and EP support#20676

Draft
andriiryzhkov wants to merge 13 commits intodarktable-org:masterfrom
andriiryzhkov:ort_scripts
Draft

[AI] GPU ORT install: manifest-driven UI button, scripts, and EP support#20676
andriiryzhkov wants to merge 13 commits intodarktable-org:masterfrom
andriiryzhkov:ort_scripts

Conversation

@andriiryzhkov
Copy link
Copy Markdown
Contributor

@andriiryzhkov andriiryzhkov commented Mar 26, 2026

Follow-up to #20647 (ORT library path preference and custom loading). Addresses #20532.

Summary

Manifest-driven system for installing GPU-accelerated ONNX Runtime directly from darktable or via scripts.

Package manifest (data/ort_gpu.json)

All download URLs, SHA-256 checksums, archive formats, and ROCm version mappings in a single JSON file – update it to change URLs/versions without rebuilding darktable. Supports NVIDIA (CUDA), AMD (MIGraphX), and Intel (OpenVINO) on Linux and Windows.

Install button in AI preferences (preferences_ai.c)

  • Detects GPUs (NVIDIA via nvidia-smi, AMD via rocminfo, Intel via lspci)
  • Selection dialog if multiple GPU vendors found
  • Shows requirements and missing dependencies with distro-specific install hints
  • Downloads with progress bar, verifies SHA-256 checksum
  • Extracts and validates the library, auto-fills the ORT path preference
  • Guarded by HAVE_AI_DOWNLOAD (requires curl + libarchive)

Backend (ort_install.c/h)

  • GPU detection for NVIDIA, AMD, Intel with dependency checks
  • Manifest loading via json-glib, package matching by vendor + platform + ROCm version
  • Download via curl, SHA-256 verification, extraction via libarchive (tgz/zip/whl)
  • Library validation via dt_ai_ort_probe_library

Unified install scripts

Script Platform Requires
install-ort-gpu.sh Linux jq
install-ort-gpu.ps1 Windows PowerShell
install-ort-amd-build.sh Linux cmake, gcc, python3

Scripts read the same ort_gpu.json manifest, detect GPU, download and verify the matching package. Support --vendor, --force, --manifest flags.

EP availability expanded (backend_common.c)

CUDA and OpenVINO now selectable on Windows (previously Linux-only).

EP Linux Windows macOS
CPU yes yes yes
CoreML yes (bundled)
CUDA (NVIDIA) yes yes
MIGraphX (AMD) yes
OpenVINO (Intel) yes yes
DirectML yes (bundled)

Documentation

tools/ai/README.md – install methods, EP table, manifest format, manual install, verification.

Test plan

  • Linux + NVIDIA CUDA
  • Linux + AMD MIGraphX
  • Linux + Intel OpenVINO
  • Windows + NVIDIA CUDA
  • Windows + Intel OpenVINO

@andriiryzhkov andriiryzhkov force-pushed the ort_scripts branch 2 times, most recently from 2fc7d22 to bf749a7 Compare April 2, 2026 08:26
@andriiryzhkov andriiryzhkov changed the title [AI] GPU-accelerated ORT install scripts and documentation [AI] GPU ORT install: manifest-driven UI button, scripts, and EP support Apr 2, 2026
@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

@TurboGit : I need help testing this scripts and UI installer. Can you test it with NVIDIA card on Linux?

And maybe you can advise who can help with other tests?

@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 2, 2026

@andriiryzhkov : Just to be sure, you mean install-ort-gpu.sh, right?

I'll try to do that tomorrow.

@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

@TurboGit :

install-ort-gpu.sh is a first part.
The second part (basically the same result) is "install" button added in AI preferences tab.

@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 3, 2026

@andriiryzhkov : Using the script was ok. I was also able to install with the in UI [install] button. I saw a difference, when using the script I had not softlink created whereas with the in UI installation I got:

libonnxruntime.so -> libonnxruntime.so.1
libonnxruntime.so.1 -> libonnxruntime.so.1.24.4

@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

@TurboGit : Thank you for testing.

The symlinks are created by the UI install for extra safety, but not strictly required — darktable loads the library by full path from the preferences config, not via LD_LIBRARY_PATH or soname resolution. The script works fine without them since it provides the full path directly. I can add symlink creation to the script too for consistency if you prefer.

Looks like NVIDIA and AMD are tested.

@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 3, 2026

I can add symlink creation to the script too for consistency if you prefer.

No not needed, just wanted to be sure this was expected and not the a cause of another issue.

@andriiryzhkov andriiryzhkov force-pushed the ort_scripts branch 2 times, most recently from 3d4266e to 4544aee Compare April 8, 2026 11:44
@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

Fixed cuDNN detection in UI installation on Ubuntu and Debian

@da-phil
Copy link
Copy Markdown
Contributor

da-phil commented Apr 14, 2026

Some feedback after I tried to install the ONNX runtime for ROCm 7.2, / MIGraphX:

❯ ./install-ort-gpu.sh --vendor amd --manifest ort_gpu.jsononnx
Vendor override: amd (skipping GPU detection)

ONNX Runtime 1.23.2 - GPU acceleration installer
============================================================

GPU: AMD GPU
ORT version: 1.23.2
Download size: ~300 MB
Install to: /home/phil/.local/lib/onnxruntime-migraphx
Requirements: ROCm 7.2, MIGraphX

Continue? [y/N] y

Downloading...
/tmp/tmp.V9P1P4uMDj/ort-package              100%[===========================================================================================>]  19,55M  10,7MB/s    in 1,8s    
Verifying checksum...
Checksum OK.
Extracting...

Done. Installed to: /home/phil/.local/lib/onnxruntime-migraphx
-rwxr-xr-x 1 phil phil 560K Apr 14 12:35 /home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime_providers_migraphx.so
-rwxr-xr-x 1 phil phil  16K Apr 14 12:35 /home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime_providers_shared.so
-rwxr-xr-x 1 phil phil  27M Apr 14 12:35 /home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime.so.1.23.2

To enable in darktable:

  1. Open darktable preferences (Ctrl+,)
  2. Go to the AI tab
  3. Click 'detect' to find the installed library automatically,
     or set 'ONNX Runtime library' to:
     /home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime.so.1.23.2
  4. Restart darktable

Or via command line:

  DT_ORT_LIBRARY=/home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime.so.1.23.2 darktable

Now this is what I get when I start darktable with ai debug msgs enabled and click on the restore module:

❯ DT_ORT_LIBRARY=/home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime.so.1.23.2 darktable -d ai
darktable 5.5.0+965~g44a4e93598-dirty
Copyright (C) 2012-2026 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Exiv2                  -> 0.27.6
  Lensfun                -> 0.3.4
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.6.0
  Colord                 -> DISABLED
  gPhoto2                -> ENABLED  - Camera tethering is available
  OSMGpsMap              -> DISABLED - Map view is NOT available
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  LibRaw                 -> ENABLED  - Version 0.22.0-Release
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED
  AI                     -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

[dt starting] as : darktable -d ai
     0,3188 [ai_models] initialized: models_dir=/home/phil/.local/share/darktable/models, cache_dir=/home/phil/.cache/darktable/ai_downloads
     0,3191 [ai_models] using repository: darktable-org/darktable-ai
     0,3191 [ai_models] registered model: mask sam2.1 hiera small (mask-object-sam21-small)
     0,3191 [ai_models] registered model: mask segnext vitb-sax2 hq (mask-object-segnext-b2hq)
     0,3191 [ai_models] registered model: denoise nind (denoise-nind)
     0,3191 [ai_models] registered model: upscale bsrgan (upscale-bsrgan)
     0,3191 [ai_models] registry loaded: 4 models from /opt/darktable/share/darktable/ai_models.json
     2.8004 [darktable_ai] dt_ai_env_init start.
     2.8005 [darktable_ai] discovered: upscale bsrgan (upscale-bsrgan, backend=onnx)
     2.8005 [darktable_ai] discovered: mask sam2.1 hiera small (mask-object-sam21-small, backend=onnx)
     2.8005 [darktable_ai] discovered: denoise nind (denoise-nind, backend=onnx)
     2.8006 [darktable_ai] discovered: mask segnext vitb-sax2 hq (mask-object-segnext-b2hq, backend=onnx)
    20.7755 [neural_restore] preview: exported 9568x6376, scale=1, export_size=0
    20.7986 [darktable_ai] loaded ORT 1.23.2 from '/home/phil/.local/lib/onnxruntime-migraphx/libonnxruntime.so.1.23.2'
The requested API version [24] is not available, only API versions [1, 23] are supported in this build. Current ORT Version is: 1.23.2
    20.7987 [darktable_ai] ORT 1.23.2: using API version 23 (compiled for 24)
    20.7987 [darktable_ai] execution provider: MIGraphX
    20.7993 [darktable_ai] MIOpen cache: /home/phil/.cache/darktable/ai/amd/miopen
    20.7993 [darktable_ai] MIGraphX cache: /home/phil/.cache/darktable/ai/amd/migraphx
    20.8158 [darktable_ai] loading: /home/phil/.local/share/darktable/models/denoise-nind/model.onnx
    20.8160 [darktable_ai] attempting to enable AMD MIGraphX...
    20.9144 [darktable_ai] AMD MIGraphX enabled successfully.
2026-04-14 12:50:36.136523057 [W:onnxruntime:DarktableAI, migraphx_execution_provider.cc:167 MIGraphXExecutionProvider] [MIGraphX EP] MIGraphX ENV Override Variables Set:
2026-04-14 12:50:36.431876117 [W:onnxruntime:DarktableAI, migraphx_execution_provider.cc:1309 compile_program] Model Compile: Begin

It just gets stuck with Model Compile: Begin, while my GPU gets very busy and hot.

I also attached the output of rocminfo: rocminfo.txt.
And here are all installed rocm/migraph pkgs: rocm_migraph_pkgs.txt

I assigned 10 GB of pageable RAM to the iGPU, which should be sufficient, shouldn't it?

❯ amd-ttm
💻 Current TTM pages limit: 2621440 pages (10.00 GB)
💻 Total system memory: 30.64 GB

It is certainly not hitting any memory bottleneck as per the amd-smi overview:

❯ amd-smi
+------------------------------------------------------------------------------+
| AMD-SMI          26.4.0+478a7a43c6                                           |
| OS kernel Version:  6.18.20-061820-generic                                      |
| ROCm Version:    7.13.0                                                      |
| VBIOS Version:   00077464                                                    |
| Platform:        Linux Baremetal                                             |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:65:00.0 ...adeon 780M Graphics | N/A        N/A   0                 N/A |
|   0       0     N/A             N/A | N/A        N/A           3128/10240 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU      PID  Process Name       GTT_MEM  VRAM_MEM  MEM_USAGE  CU %  SDMA   |
|==============================================================================|
|    0   646843  darktable           1.3 GB    5.5 MB     1.3 GB    N/A   0 us |
+------------------------------------------------------------------------------+

Maybe this is just a very specific AMD / iGPU issue, as my laptop is running on a AMD Ryzen 7 8845HS with an Radeon 780M iGPU.

@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

@da-phil :

It just gets stuck with Model Compile: Begin, while my GPU gets very busy and hot.

You are not stuck, let it finish compilation. It may take long, veeerrrryyyyy long. But it's only one time per model and per tile size. I tested on my Radeon 780M iGPU and it took 30 minutes. The compilation is cached on disk. The following runs with the same tile size where very fast - just like 10 seconds or so. Much faster than CPU.

iGPUs generally are not supported by ROCm and they don't have pre-build kernels. That's why I think it takes so long. But compilation is a thing even for proper supported GPUs.

@da-phil
Copy link
Copy Markdown
Contributor

da-phil commented Apr 14, 2026

@da-phil :

It just gets stuck with Model Compile: Begin, while my GPU gets very busy and hot.

You are not stuck, let it finish compilation. It may take long, veeerrrryyyyy long. But it's only one time per model and per tile size. I tested on my Radeon 780M iGPU and it took 30 minutes. The compilation is cached on disk. The following runs with the same tile size where very fast - just like 10 seconds or so. Much faster than CPU.

iGPUs generally are not supported by ROCm and they don't have pre-build kernels. That's why I think it takes so long. But compilation is a thing even for proper supported GPUs.

Okay, this time I was patient enough and voila, it worked!
Processing a 9568x6376 image on the iGPU was significantly faster than on the CPU, by a factor of 3.2x (55.29s VS 178.33s)!

On CPU:

   224.1143 [neural_restore] job started: task=denoise, scale=1, images=1
   224.1163 [neural_restore] processing imgid 63525 -> /home/phil/Pictures/some_image.tif
   229.3276 [neural_restore] processing 9568x6376 -> 9568x6376 (scale=1)
   229.3280 [restore] tiling 9568x6376 (scale=1) -> 9568x6376, 7x5 grid (35 tiles, T=1536)
   402.4445 [neural_restore] imported imgid=63529: /home/phil/Pictures/some_image.tif

On GPU:

  1980.9891 [neural_restore] job started: task=denoise, scale=1, images=1
  1980.9913 [neural_restore] processing imgid 63525 -> /home/phil/Pictures/some_image.tif
  1984.5201 [neural_restore] processing 9568x6376 -> 9568x6376 (scale=1)
  1984.5211 [restore] tiling 9568x6376 (scale=1) -> 9568x6376, 7x5 grid (35 tiles, T=1536)
  2036.2812 [neural_restore] imported imgid=63530: /home/phil/Pictures/some_image.tif

But now the problem: the GPU path leads to significant vertical stripe artifacts:

image

Is this a known issue or is there anything I can do to debug it? The CPU path leads to artifact-free denoised images.

@andriiryzhkov
Copy link
Copy Markdown
Contributor Author

But now the problem: the GPU path leads to significant vertical stripe artifacts

That is interesting. I will check, but assume it would be hard to trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants