Quantize pretrained models

Hi, 

as model is big for limited GPU memories, is it  possible to compress the model by binarization or quantization method not only to reduce the size of models, but also to speed up the process?