Hi,
In the paper Fig 4 (256x256 diffusion models), the ratio between training speed (steps/sec) of MaskDiT to DiT for bs=256 and bs=1024 is different (~60% and ~73% higher, respectively). I also tried the code for bs=16 (256x256, single GPU) and got 1.24 vs 1.19 (4%) for MaskDiT and DiT(no decoder, mask_ratio=0%), respectively. I was wondering what is the reason behind these discrepancies. Thanks.