Discrepancies in training speed ratios

Hi,
In the paper Fig 4 (256x256 diffusion models), the ratio between training speed (steps/sec) of MaskDiT to DiT for bs=256 and bs=1024 is different (~60% and ~73% higher, respectively). I also tried the code for bs=16 (256x256, single GPU) and got 1.24 vs 1.19 (4%) for MaskDiT and DiT(no decoder, mask_ratio=0%), respectively. I was wondering what is the reason behind these discrepancies. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancies in training speed ratios #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancies in training speed ratios #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions