Hi, I am now using my own scripts implemented with diffusers and accelerate to finetune QwenImageLayer and I found that the convergence is slow and even it didn't converge with large loss around 0.84.
My modification from QwenImage:
- RGBA-VAE encoding
- 3Drope is constructed in the order of [[recontructed_image, layer00, layer01, ..., cond_image]]
- resolution 640
I found a released training script here https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/qwen_image/model_training/full/Qwen-Image-Layered.sh. However, I am new to DiffSynth-Studio.
Can someone share the major differences between finetuning QwenImage and QwenImageLayer, thanks!