Skip to content

auto-detect GPU arch and adapt XSched preemption level, remove hardco…#1

Open
kylin1019 wants to merge 1 commit intoXpuOS:xschedfrom
kylin1019:xsched
Open

auto-detect GPU arch and adapt XSched preemption level, remove hardco…#1
kylin1019 wants to merge 1 commit intoXpuOS:xschedfrom
kylin1019:xsched

Conversation

@kylin1019
Copy link
Copy Markdown

Enable automatic GPU architecture detection and dynamic XSched preemption level adaptation for llama.cpp. Removed hardcoded preemption levels to support full compatibility across NVIDIA GPU platforms and eliminate runtime crashes.

…eleration, auto-detects GPU architecture, adapts preemption level, and runs stably without CUDA errors
@wuwen03
Copy link
Copy Markdown
Collaborator

wuwen03 commented Apr 4, 2026

Hi @kylin1019,

Thank you for your contribution — your manual fix is indeed correct.

As you also mentioned in XpuOS/xsched#23, automatic XQueue creation may be an even cleaner and more efficient way to integrate XSched into llama.cpp. We are therefore considering adding more APIs to xsched so that an XQueueHandle can be obtained directly from a cudaStream. This would allow us to remove the stream construction logic and the mapping between cudaStream and XQueueHandle.
Would this approach make sense to you?

Also, for consistency and easier collaboration, please use English for code comments in future updates.

related to XpuOS/xsched#23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants