-
-
Notifications
You must be signed in to change notification settings - Fork 138
Description
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Describe the bug
I set inline_model_loading: true so I could load models dynamically from the models directory by name.
I saved the model from huggingface.co, such as ArtusDev_Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3, then I created a symlink named coder to that directory.
The goal was to be able to use "coder" as my model name for the API call while keeping a descriptive directory name on disk. This also lets me re-symlink to a different model without changing the clients making the calls.
However, I think it resolves the symlink, properly loads the model but thinks the model is called ArtusDev_Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3 instead of coder, then when the next API call comes in for model coder it unloads the model, then immediately reloads it (its the same model).
Reproduction steps
- Set
inline_model_loading: true - Save a model to the models directory. I used
ArtusDev_Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3 ln -s ./ArtusDev_Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3 coder- Make an API call with the model name "coder"
- The model will unload, then reload
Expected behavior
The model would stay loaded and be used.
Logs
No response
Additional context
This is super low priority
Acknowledgements
- I have looked for similar issues before submitting this one.
- I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.