About RL training

Thank you for your contribution!
I'm trying to reproduce the training pipeline. The sft training went well, however the RL training went into OOM.
8 * L40S is used for training. Is it necessary to use 8 * A100 for RL training ? And Is there any alternative to RL train with less GPU occupancy?