Skip to content

[DRAFT] AMD/ROCM Support#1984

Open
Fmstrat wants to merge 1 commit intoAcly:mainfrom
Fmstrat:rocm
Open

[DRAFT] AMD/ROCM Support#1984
Fmstrat wants to merge 1 commit intoAcly:mainfrom
Fmstrat:rocm

Conversation

@Fmstrat
Copy link
Copy Markdown

@Fmstrat Fmstrat commented Aug 29, 2025

This DRAFT PR is under development to facilitate conversation about a formalized ROCM implementaiton in Krita AI Diffusion.

Outside of KAD, I have a functioning Dockerfile that supports KAD with all features of SD1.5, SDXL, and FLUX (including Kontext, but not SDVQ quant since Nunchaku does not support ROCM). This PR will port my work directly into KAD so that it can formally support ROCM going forward, but will need some guidance to complete as I'm not familiar with the code base.

Working:

  • Full ROCM implementation with external ComfyUI (Tested on MI300X)

In progress:

  • Dockerfile based on successful external install
  • Checks on supported nodes for models to throw errors on lack of SDVQ (if needed, not yet in PR)

To do:

  • UI updates
  • Model Database recommendations for non-SDVQ FLUX

I'm going to assume there are a number of things in the to do list that I'm missing. My biggest question atm is there is no documentation on how to use the docker.py script in a development environment to build the container, as the ComfyUI setup is done outside of the Dockerfile (my custom setup does the full build in the Dockerfile).

@Acly
Copy link
Copy Markdown
Owner

Acly commented Aug 29, 2025

Having built in ROCM support would be great!

My biggest question atm is there is no documentation on how to use the docker.py script in a development environment to build the container

What is unclear exactly? You run python scripts/docker.py and it will download and set everything up for the docker build.

Regarding Nunchaku/SVDQ:

  • docker.py / CUDA Dockerfile is still missing support for Nunchaku anyway, just something to keep in mind
  • you can pass --no-cuda to start.sh (which passes it on to download_models.py), this will skip SVDQ models
  • might still need non-SVDQ alternatives in model database

For local testing of both the docker image and the UI installer it can be helpful to run the local file server instead of downloading from huggingface. From repo root do

python scripts/download_models.py scripts/downloads --minimal
python scripts/file_server.py

To use local file server with docker pass -e AI_DIFFUSION_DOWNLOAD_URL=http://host.docker.internal:51222 to "docker run"
To use local file server in Krita set HOSTMAP=1 environment variable

@Fmstrat
Copy link
Copy Markdown
Author

Fmstrat commented Aug 29, 2025

That's helpful, thank you. For the docker.py script, it throws an error when importing ai_diffusion, stating that the release ZIP wasn't downloaded. I've installed the requirements in the root of the repo, so there is probably some simple step to get it to recognize the folder import that I'm missing (away from PC now).

For SVDQ, I have Nunchaku working in my custom nvidia Dockerfile. I could look to do a PR for that as well, if you would like.

@Acly
Copy link
Copy Markdown
Owner

Acly commented Aug 29, 2025

For the docker.py script, it throws an error when importing ai_diffusion, stating that the release ZIP wasn't downloaded.

Likely you're missing a git submodule update

@Fmstrat
Copy link
Copy Markdown
Author

Fmstrat commented Aug 30, 2025

@Acly Yup, that was it. I've got everything building now, but do have an infrastructure question. It seems the Dockerfile has a lot of additional tooling not required to run Krita AI (rclone/Jupyter/etc). I'd like to split this up into the below, and have these auto-build in a GitHub Workflow whenever a formal Release is made so the images are available to pull:

  • ghcr.io/Acly/krita-ai-diffusion:comfyui - Just the base required with nvidia support. This would also be tagged latest
  • ghcr.io/Acly/krita-ai-diffusion:comfyui-rocm - Just the base required with rocm support
  • ghcr.io/Acly/krita-ai-diffusion:comfyui-extended - Additional commonly used tooling + nginx
  • ghcr.io/Acly/krita-ai-diffusion:comfyui-rocm-extended - Additional commonly used tooling + nginx

Each of the above would also get a version tag as they were updated on releases. Any objections to this?

@Acly
Copy link
Copy Markdown
Owner

Acly commented Aug 30, 2025

You probably saw there's a base version already, I use it to eg. build the cloud images. But what's the use case for making them public? just customization? I think it's more likely people will just find one of the many ComfyUI images and add the few things Krita needs. The existing image is mainly used by people who don't know much about docker and need something that just works.

I believe nginx is required to expose ComfyUI on typical hosters for the websocket-over-http at least. The other stuff also doesn't hurt much all things considered.

Building on GH actions would be nice, so far the images are pushed to https://hub.docker.com/r/aclysia/sd-comfyui-krita/tags though.

@Acly
Copy link
Copy Markdown
Owner

Acly commented Sep 2, 2025

I updated models.json to include non-nunchaku version of Kontext, (and fp4 versions for nvidia blackwell)

Also download_models.py should automatically fetch the matching set of models for detected hardware now, see 21e7183

Haven't tested this with the docker images yet though.

@bghira
Copy link
Copy Markdown

bghira commented Dec 19, 2025

in #2228 i have it working / installing ROCm dependencies for the diffusers pipeline.

@AMDphreak
Copy link
Copy Markdown

Is this PR still relevant and applicable in March 2026? I thought ROCm support was added already.

@Acly
Copy link
Copy Markdown
Owner

Acly commented Mar 21, 2026

There is no ROCm install option. Problem is not the code, it's finding someone to test/maintain it. See #2315

@Reaper176
Copy link
Copy Markdown

There is no ROCm install option. Problem is not the code, it's finding someone to test/maintain it. See #2315

I would just like to point out that the only thing you need to alter "at least for linux" is the pytorch url that you are pulling from.

The only thing required to manually set this up is to run

pip install --index-url https://repo.amd.com/rocm/whl/gfx110X-dgpu/ torch torchvision torchaudio

if you want to pull from amd's site or

pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.2

if you want to pull from pytorches website. Nothing else needs to change.

@Acly
Copy link
Copy Markdown
Owner

Acly commented Apr 9, 2026

I would just like to point out that the only thing you need to alter "at least for linux" is the pytorch url that you are pulling from.

Yeah this is kinda what I meant with "problem is not the code", it's a simple change. But I dislike publishing things I can't test, and have seen various comments about pulling from this rocm repo for certain GPU, and from another for certain other GPU, and sometimes you need nightly, etc.

So I can put it in no problem, but then people open issues that it's not working for them, and there's not much I can do. Was hoping someone would at least show some interest into testing and commenting on such issues.

@Flaconia
Copy link
Copy Markdown

So I can put it in no problem, but then people open issues that it's not working for them, and there's not much I can do. Was hoping someone would at least show some interest into testing and commenting on such issues.

I'd gladly try it on my machine (Ryzen 7700 with iGPU + Radeon RX 7900 GRE) in return for having it available, but I'd probably need some guidance if things get too technical, if you're eager to provide it.

I'm not a coder but I managed to make a couple local models work (ROCm + LMStudio for local LLM models, or PyTorch + Whisper for local speech recognition). I'm using Manjaro (Arch btw).

Let me know if I can help and how.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants