Nice repo!!!
it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head
I tried the provided example of flan-T5 here: https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing
when I changed the value unfreeze_layer_from_past to be 1 to update the wights of the final layer of flan-t5 like this:

the behavior change the the actor starts to generate empty text:

Also after training it gave me empty text:

what is the reason of the this behavior?
NOTE: I did not change anything else in the flan-t5 code example.
Nice repo!!!
it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head
I tried the provided example of flan-T5 here: https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing
when I changed the value unfreeze_layer_from_past to be 1 to update the wights of the final layer of flan-t5 like this:

the behavior change the the actor starts to generate empty text:

Also after training it gave me empty text:
what is the reason of the this behavior?
NOTE: I did not change anything else in the flan-t5 code example.