You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To get started, download the model from [HuggingFace](https://huggingface.co/moonshotai/Kimi-K2-Instruct). Kimi K2 uses a trillion-parameter architecture that requires efficient sharding.
52
-
* Run the conversion script to transform the HuggingFace weights into the MaxText-compatible[Orbax](https://orbax.readthedocs.io/en/latest/guides/checkpoint/orbax_checkpoint_101.html)format.
52
+
* Run [convert_deepseek_family_ckpt.py](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/checkpoint_conversion/standalone_scripts/convert_deepseek_family_ckpt.py)to convert the checkpoint for MaxText compatibility in[Orbax](https://orbax.readthedocs.io/en/latest/guides/checkpoint/orbax_checkpoint_101.html)for training and fine-tuning.
53
53
* Note that Kimi K2 utilizes **YaRN** for context window extension to 128k; ensure your configuration reflects these positional embedding settings during conversion for decoding.
0 commit comments