Have you considered incorporating this work into an open source inference framework, such as vLLM?
Have you considered incorporating this work into an open source inference framework, such as vLLM?