Hi team! Thanks for the awesome paper and work. I was able to install and run EditScore to test with a couple of my images.
For Qwen2.5-VL-7B, the average inference time per image is around ~12 seconds. But for Qwen3-VL-4B, the average is about ~6 minutes per image. Is this the expected inference time? Or am I doing something wrong here?
I followed all installation instructions and am testing only on my images. Average resolution of the images is between 1k-2k pixels.
I'm running the models on a single A100 80GB.
Thanks in advance for the help!!