And ensure `DeviceSpecific` per-device op states get deallocated at the end of training
And ensure
DeviceSpecificper-device op states get deallocated at the end of training