I noticed that you are reseting the env at the beginning of each episode!
By doing this you never go forward in your dataset!
check train_agent function.
Thats why you train your model for 50000 episodes and nothing goes wrong!! (all your database is smaller than 50000. it is 23450)