Skip to content

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Feb 4, 2026

This Cursor-generated change is expected to resolve failures in QA environments (nvbug 5821337).


Problem

The test TestBufferPeerAccessAfterImport::test_main was failing with assertion errors indicating memory comparison failures. The root cause is most likely a synchronization issue when accessing peer memory.

When dev0 accesses peer memory from dev1, PatternGen.verify_buffer() only synchronizes dev0 (the accessing device) but not dev1 (the resident device). This can cause synchronization issues where dev0 reads peer memory before dev1 has completed all operations, leading to incorrect data being read.

Changes

  • Added dev1.sync() call after IPC import (Test 1)
  • Added dev1.sync() call after granting peer access (Test 3) before dev0 accesses peer memory

This follows CUDA best practices: when accessing peer memory, sync the resident device to ensure its operations are complete before the peer device reads the memory.

Add missing device synchronization calls to ensure resident device
operations are complete before peer device accesses memory.

The test was failing because when dev0 accesses peer memory from dev1,
PatternGen only syncs dev0 (the accessing device) but not dev1 (the
resident device). This can cause synchronization issues where dev0
reads peer memory before dev1 has completed all operations.

Changes:
- Sync dev1 after IPC import (Test 1) to ensure import operations complete
- Sync dev1 after granting peer access (Test 3) before dev0 accesses
  peer memory

This follows CUDA best practices: when accessing peer memory, sync the
resident device to ensure its operations are complete before the peer
device reads the memory.

Fixes test failures on ARM64 with CUDA 13.2 RC025.

Co-authored-by: Cursor <cursoragent@cursor.com>
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 4, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 4, 2026

/ok to test

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

@rwgk rwgk self-assigned this Feb 4, 2026
@rwgk rwgk added test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Feb 4, 2026
@rwgk rwgk requested a review from Andy-Jost February 4, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant