-
Notifications
You must be signed in to change notification settings - Fork 165
Tobin synch model cleanup and 2nd cbs #2049
Conversation
6dcc659 to
b0b10bd
Compare
This is the initial version of a single-case update to handle synchronization tracking for the cmd sequence from GH892. As the code stands it's only primarily useful for this single cmd sequence and isn't catching any variety of memory conflicts. The intention is that if the general idea is sound then I will build on this initial code to expand/generalize the algorithms used here for all memory conflict cases.
This is a simple test positive case based on GH892. For this initial test the pipeline barrier is correctly in place so no errors should be signalled. This is the basic case that will be used to generate some more test cases, both positive and negative.
This is a negative test featuring a RaW dependency with no synch object in-between to regulate the read.
Added a look-up-table for fixed pipe stage and access masks per cmd. Added some functions to generalize adding/validating memory accesses per command and migrated the checks into buffer_validation files. Added CmdCopyImage tracking. Initially treating all image accesses conservatively and assuming they touch full size of image memory. To deal with this and further-such issues that will arise I updated MemoryAccess to have a "precise" bool that indicates when Access details are exact. Only when both Accesses are precise can we issue an error, otherwise just issue a warning.
Integrate CmdCopyBuffer into MemoryAccess recording and validation framework. Refactored recording of read/write memory accesses to use the binding which saves some repeated lines of code.
Update synchronization modeling to handle barrier dependency chains. Initially only handling CmdPipelineBarriers. Create separate synch_commands vector in cmd buffer for quick parsing of just synch commands. For any previous synch commands whose destination synch scope overlaps the current synch command's source synch scope, merge the 1st cmds dest mask into current commands src mask and pull previous barriers into the current synch command. Updated existing PipelineBarrier algorithm for global and buffer barriers to make sure that pipeline masks overlap and only check access masks in that case
Adding positive test UpdateBufferWaRDependencyWithTwoBarrierChain. This features a WaR dependency that's cleared by a chain of two pipeline barriers. This confirms that validation correctly supports the merging of overlapping barriers to correctly account for synchronization chains.
Add tracking for the memory accesses in CmdBlitImage. Initially just adding a read & write access for the src & dest image memory that conservatively references whole binding and is recorded as imprecise so that it only will cause warnings.
Add tracking for the memory access in CmdFillBuffer. This is a single write to a buffer for which we have precise info.
Add tracking for the memory accesses in CmdClear[DS|Color]Image. Initially just adding imprecise write access covering the whole image binding area.
Update CmdClearAttachments to record a write memory access per image attachment that's cleared. This is an imprecise access across the entire image binding for now.
Update global mem access state when CmdClearAttachments is called.
Add tracking for the memory accesses in CmdResolveImage. Initially just adding a read & write access for the src & dest image memory that conservatively references whole binding and is recorded as imprecise so that it only will cause warnings.
Just moving code to prep for adding mem access support to CmdCopyQueryPoolResults. Refactor it to use the common PreValidate* & PostRecord* pattern.
Add the buffer write access for CmdCopyQueryPoolResults. Updated query pool data struct to store number of elements in the query based on query type and then use that number when calculating total data size.
Update the Draw/Dispatch validate/update framework to account for all memory accesses through descriptors. At validate time grab the complete set of active read and write memory bindings from descriptors. Then verify the memory accesses to flag any conflicts. During state update, record all of the accesses into the cmd buffer state. All of these accesses are imprecise and will only result in warnings if there is a potential synch issue.
88ba074 to
ea09fe0
Compare
|
I pushed a follow-on CL that just shuts off the callback and tests for now, along with disabling Draw-time checks that have ~15% perf impact on smoketest. This is intended to make it safer to land the initial version so that further testing/development can be done before turning on the callback by default. |
layers/core_validation.cpp
Outdated
| } | ||
| const auto &subpass_desc = render_pass_state->createInfo.pSubpasses[0]; | ||
| for (uint32_t i = 0; i < subpass_desc.inputAttachmentCount; ++i) { | ||
| // Mark input attachments as read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started something that I then delayed for later and didn't kill this loop. Just pushed update to kill it. Will add MemAccess tracking for renderpass attachments in the future.
A PipelineBarrier command that occurs within a renderPass is only for handling a subpass self-dependency. That case is not currently included in the synch model so updating code to only record barriers that are outside of a renderPass for now.
Moved existing code into RecordBarrierMemoryAccess() function. This generalizes to code so it can more easily be multi-purposed in order to record barriers for CmdWaitEvents().
This is a race condition test to check both a RaW and WaR conflict with buffers used in a Draw. A src buffer is copied to a dst buffer, then, without a barrier, the src is used as a storage buffer in a draw and the dst is used as a uniform buffer in the same draw. This results in a RaW conflict for the Dst/Uniform buffer and WaR conflict for the Src/Storage buffer.
Make any memory access race condition conflict messages warning-level for initial release. There are still some holes in modeling the synch primitives in validation so setting callback to warning allows for a soft release where people can ignore or post bugs on incorrect warnings while the synch model is updated in validation. The current warning message is meant to deter developers from sinking too much time into debugging these warnings, and promotes feedback for known-bad warning cases.
Record barriers in CmdWaitEvents using same method used for barriers from CmdPipelineBarriers. This involves merging any previous barriers to handle dependency chains and then recording the barriers into any previous memory accesses that fall into the source synch scope.
Modify ValidateMemoryAccesses() so that it takes map of prev accesses directly. This is preparation to share the function with command buffer submission validation where we'll deal with sets of initial accesses that may cross many command buffers.
At QueueSubmit time, check for any memory conflicts between command buffers in a single submit. Here's the high-level algorithm: 1. While submitting CBs, store vector of CB state for each CB 2. For each CB, create vector of mem accesses that are "live" meaning that no barrier within the CB made them visible 3. Check live vector against all other live mem accesses up to this point in submit. 3a. If there are no potential conflicts, save live access vector into submit access map and continue with next submit 3b. If there are potential conflicts, we need to replay commands up to this point so continue to step 4. 4. Gather all cmds submitted up to this point in a single vector 5. Note any synch commands that occur between the early and late mem access conflict commands 6. Replay mem access commands, including synch commands between the points of interest 7. Following replay, re-check for conflicts and if they still occur, flag an error Also update the race condition warning message between two separate command buffers with additional information about second cmd buffer.
Added TwoCommandBufferUpdateBufferRaWDependencyMissingBarrier test that is an alternate version of UpdateBufferRaWDependencyMissingBarrier test that separates the CmdUpdateBuffer and CmdDrawIndirect commands across two separate command buffers and verifies that the RaW race condition is flagged.
Merging synch barriers may be a bit naive. Note potential bug for future fix.
Add MultiCBDrawWithBufferWaRandRaWSynchChain test. This features three command buffers with mem access dependencies and synch commands that cross CB boundaries. This is a positive test to verify that no errors occur.
Mis-match between size_t and uint32_t was causing some Windows compiler warnings. Use size_t uniformly where vector size() function involved.
Move the code validate inter-cmd-buffer synch conflicts at QueueSubmit time into its own function. This also fixes a bug where the vector of cmds to replay was not being reset when more than one replay session was required for a single commit. It also streamlines the code a bit by making the update of mem access map with latest live mem accesses common code between fast and slow path.
This commit stores memory accesses into separate read and write maps. This provides a perf boost for checking potential memory access faults by eliminating uneccessary checks when previous access maps that would be checked are empty so we don't iterate over maps that have no potential for any conflicts. A couple of other changes to cleanup memory access checks: 1. Merge "Add*MemoryAccess()" and "ValidateMemoryAccesses()" functions into a single "CreateAndValidate*MemoryAccess()" function. This creates the local access struct and validates it, but it won't be recorded into the CB until/unless validation passes. 2. Split Mem access violation reporting into a separate function that can be shared by a couple of different cases that flag errors.
ea09fe0 to
316aec8
Compare
This is a conservative measure to simplify landing and integration of the memory access checks. For now turning off the callback and also disabling some draw-time checks that seem to have ~10-20% perf hit in limited testing. Turning off callback also means turning off tests that rely on the callback. Will continue advancing this code and testing with checks enabled to reach desired functionality before enabling code on master.
316aec8 to
3cf92a5
Compare
|
Replaced by #2066 |
Replaces #2026
This includes all of the previous memory race condition checking but greatly improves performance by splitting per-cmd-buffer memory access map into read and write maps. This allows skipping large groups of checks when there's no possibility of conflicts.
I'm still seeing about 15% perf hit on smoketest, which is much better than ~2.5x previously. Nearly all of the additional hit comes from grabbing the memory accesses from the descriptors at draw time. I'll look at speeding that up as well.
If the 15% perf hit and the experimental warning conditions are deemed acceptable, I could land this near it's present form. Main goal at this point is to land some initial code, though, so that these core changes are in place moving forward. To get this in with minimal impact, though, for now I can turn off the warning and disable the call mentioned above that's eating a bit of perf. I think I favor this approach in the short term so let me know if there are other opinions.