fix: disable USM compression for any multi-device context#930
Open
F3zz1k wants to merge 1 commit into
Open
Conversation
The existing peer-USM compression guard in createUnifiedMemoryAllocation is gated on device->hasAnyPeerAccess().value_or(false). That std::optional<bool> is populated lazily by initializePeerAccessForDevices at driver init and can be observed as Some(false) for the first USM device allocation in a multi-device L0 context. The disable arm is then skipped, the BO is allocated with the compression flag enabled, and any later peer-import returns the compressed bytes through dma-buf without the matching surface metadata. Use rootDeviceIndices.size() > 1 as an additional gate. That property is known synchronously at allocation time and conservatively disables compression for any BO in a multi-device context, where peer-import can happen at any subsequent point regardless of the current peer-access cache state. Verified on 2x Intel Arc Pro B70 (BMG-G31), Linux 7.1-rc4 + xe (with Leon Romanovsky's "dma-buf: Always build with DMABUF_MOVE_NOTIFY"): multi-device sycl::queue::memcpy(host, dev0_usm_ptr) executed on dev1 previously returned garbage; now returns the source data correctly. The OpenCL backend symptom (UNRECOVERABLE_IF(!allocation) at enqueue_svm.h:308) is resolved by the same fix. Related: vllm-project/vllm#41663, intel#916, intel#921, intel#922. Signed-off-by: Gabriel Bouffard <gabriel.a.bouffard@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #921
The peer-USM compression guard in
SVMAllocsManager::createUnifiedMemoryAllocationis gated ondevice->hasAnyPeerAccess().value_or(false). For the first USM device allocation in a multi-device L0 context thatstd::optionalcan be observed asSome(false)(peer-access cache not yet populated byinitializePeerAccessForDevices). The disable arm is skipped, the BO is allocated compressed, and any later peer-import returns garbage through dma-buf because the importing device has no matching surface metadata.This patch adds
memoryProperties.rootDeviceIndices.size() > 1as an additional, synchronously-known gate. Multi-device contexts conservatively disable compression, correct because a BO in a multi-device context can become peer-imported at any subsequent point in the application's lifetime.Verification
LD_LIBRARY_PATHagainst compute-runtime 26.05/26.18, cross-devicesycl::queue::memcpy(host, peer_usm_ptr)returned garbage before; returns source data correctly afterUNRECOVERABLE_IF(!allocation)atenqueue_svm.h:308symptom on the same workload is resolved by this fix--device SYCL0,SYCL1 --split-mode layer) regression-tested on both patched and stock, no behavioural change there (that workload's allocation timing happens to dodge the race)Unit tests
Not added in this commit. The targeted code path requires a platform where
releaseHelper->isUsmCompressionSupportedOnPeerAccess()returns false (release2001 / release2002 = BMG / LNL) ANDusmCompressionSupported(hwInfo)returns true. The defaultMockReleaseHelperreturns true for the former, so a test undershared/test/unit_test/memory_manager/would skip on most configurations.A natural home would be
shared/test/unit_test/xe2_hpg_core/which already uses the BMG release helper, or a new test usingMockReleaseHelperoverrides. If really needed, I could add it, would appreciate guidance on the preferred location and pattern (the existinggivenLocalMemoryEnabledAndCompressionEnabledThenDeviceSideSharedUsmIsCompressedtest inunified_memory_manager_tests.cpp:721is the natural sibling but uses single-root setup).Related
xe_dma_buf_pin -EINVALon peer-imported BO) fixed upstream by Leon Romanovsky's "dma-buf: Always build with DMABUF_MOVE_NOTIFY", in mainline Linux 7.1.cc @ivysochyn — this directly extends the guard you added in dc85f33 and refactored in a832f01. Race observed where hasAnyPeerAccess() is Some(false) for the very first multi-device alloc.