Hi, Thank you for sharing your code. I have read your code and learned a lot.
I have a little question about calcPerp on test data:
it seems you calculate the perplexity on test data while keep batch normalize on training.
I think it might be a problem...