Thank you for your contribution!
I'm trying to reproduce the training pipeline. The sft training went well, however the RL training went into OOM.
8 * L40S is used for training. Is it necessary to use 8 * A100 for RL training ? And Is there any alternative to RL train with less GPU occupancy?
Thank you for your contribution!
I'm trying to reproduce the training pipeline. The sft training went well, however the RL training went into OOM.
8 * L40S is used for training. Is it necessary to use 8 * A100 for RL training ? And Is there any alternative to RL train with less GPU occupancy?