A Horrible mistake!

I noticed that you are reseting the env at the beginning of each episode!

By doing this you never go forward in your dataset! 

check train_agent function.

Thats why you train your model for 50000 episodes and nothing goes wrong!! (all your database is smaller than 50000. it is 23450)