Skip to content

[ENH] Add safeguards for unsynchronized stream destruction #1539

@Andy-Jost

Description

@Andy-Jost

Summary

On platforms where Unified Memory has limited concurrency support (e.g., certain Windows platforms), destroying a stream without synchronizing can leave the system in a state where any subsequent host access to any Unified Memory causes a crash. This issue proposes adding configurable safeguards to stream destruction.

Problem Description

On affected platforms, the following sequence causes a crash:

  1. Allocate Unified Memory buffer B
  2. Launch any kernel on any stream (kernel need not touch B)
  3. Access B from the host
  4. Crash (segfault / access violation)

Root cause: On platforms where attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS is zero, the GPU holds exclusive access to all managed memory while any kernel is in flight. Host access to managed memory—even allocations unrelated to the running kernel—is forbidden until the stream is synchronized.

Key insight: Destroying the stream does not restore safe host access. Only synchronizing the stream before destruction does.

Impact on testing: Tests that launch kernels without synchronizing effectively "arm" a crash. Subsequent tests that access Unified Memory on the host will crash, even though they did nothing wrong. This makes failures difficult to diagnose.

Proposed Solution

Add logic to stream destruction that detects unsynchronized streams and optionally warns or synchronizes before the stream is destroyed.

Detection mechanisms:

  • cuStreamQuery() — check if stream has in-flight work
  • CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS — check if platform is affected

Configuration: CUDA_PYTHON_STREAM_DESTROY_SYNC_MODE

Value Behavior
0 Do nothing (current behavior, default)
1 Warn unconditionally when destroying an unsynchronized stream
2 Warn only on affected platforms (concurrentManagedAccess == 0)
3 Implicitly synchronize on affected platforms
4 Warn + synchronize on affected platforms

This gives users a spectrum from "purely diagnostic" to "safety-first" while preserving backward compatibility by default.

References

Metadata

Metadata

Assignees

Labels

cuda.coreEverything related to the cuda.core moduleenhancementAny code-related improvements

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions