Hi, thanks a lot for your code.
But when I apply this code to my implemented e2e version of FPN, some wired things happen.
If I use 8 cards, the GPU memory continues to increase until one card occupies all GPU memory. Then the FPN get stuck and the GPU utilization of other not-full cards are 100%.
If I use 4 cards, the memory continue to increase and then FPN may get struck with all GPU utilization as 0.
I used PyTorch 0.4.0 with your DataParallelWithCallback and the input image size is different on different cards. And if I use BN from official pytorch, my code works well.
Could you pls give me any hints to help me to find the reason?
Hi, thanks a lot for your code.
But when I apply this code to my implemented e2e version of FPN, some wired things happen.
If I use 8 cards, the GPU memory continues to increase until one card occupies all GPU memory. Then the FPN get stuck and the GPU utilization of other not-full cards are 100%.
If I use 4 cards, the memory continue to increase and then FPN may get struck with all GPU utilization as 0.
I used PyTorch 0.4.0 with your DataParallelWithCallback and the input image size is different on different cards. And if I use BN from official pytorch, my code works well.
Could you pls give me any hints to help me to find the reason?