what are the major differences between fintuning QwenImage and QwenImageLayer

Hi, I am now using my own scripts implemented with diffusers and accelerate to finetune QwenImageLayer and I found that the convergence is slow and even it didn't converge with large loss around 0.84.

My modification from QwenImage:
1. RGBA-VAE encoding
2. 3Drope is constructed in the order of [[recontructed_image, layer00, layer01, ..., cond_image]]
3. resolution 640

I found a released training script here https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/qwen_image/model_training/full/Qwen-Image-Layered.sh. However, I am new to DiffSynth-Studio.

Can someone share the major differences between finetuning QwenImage and QwenImageLayer, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what are the major differences between fintuning QwenImage and QwenImageLayer #1274

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what are the major differences between fintuning QwenImage and QwenImageLayer #1274

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions