feat(linux/xdgportal): implement reactive capture with duplicate detection#4740
feat(linux/xdgportal): implement reactive capture with duplicate detection#4740psyke83 wants to merge 3 commits intoLizardByte:masterfrom
Conversation
|
TLDR; test with It's vital to ~disable (set to 1 second timeout) the "Minimum FPS target" setting in order for XDG Portal capture pacing to work as expected. With the default setting of 0 (half framerate), this setting can randomly cause 50% or more extra frames (60fps -> 90fps) to be inserted, which causes network jitter that completely breaks pacing. I believe this may be happening because the arrival of pipewire buffers is not perfectly aligned with the presentation on screen, which confuses the encoder thread more than kmsgrab, which likely grabs frames at virtually identical intervals. Results with RX 6600 on Fedora 43: with the above config set appropriately, pacing is perfect on KDE and GNOME. With vsync disabled and the game capped to 60fps, sometimes there may be some slight jitter that clears up after a few seconds, but I notice the same issue with kmsgrab. Enabling vsync seems to prevent the random jitter, likely due to the compositor pacing the frames closer to the vblank interval. I need to test more to be absolutely sure it's bulletproof, but my current impression is that portal grab pacing is on par with or possibly better than kmsgrab. |
|
@psyke83 This feature also doesn't work quite right on Windows, and is the source of a lot of confusion over the years. @andygrundman tagging you as FYI, because you've expressed interest in improving this feature in the past. |
|
It would be fantastic if Sunshine behaved the same as it does on Windows when it comes to capturing frames. See When I say "like Windows", what I mean specifically is:
I have tried to argue for the min_fps_target default to be a value like 20, which would allow streaming of 24fps, 30fps, content without duplicate frames. The 0 value is very unintuitive and wasteful. The old value was more like 10, and there will be a tradeoff here with bursty data when the stream suddenly ramps back up to max, moving the mouse is where some people complain about this burst. |
I won't be dying on this hill, but please don't do this. Capture should provide the lowest possible latency within given constraints. Yes, this requires frame pacing on host, but not extremely tight frame pacing. Client still has to implement its own and proper frame pacing anyway to deal with frame transfer time jitter (not ping jitter, caused by variation in encoded frame sizes), and encoding time jitter (caused by gpu scheduling outliers). And demanding tight host-side frame pacing will only produce worse latency in the outliers on the client. A generic frame pacer looks like this. For each frame, there's three timepoints:
Then these points are dynamically adjusted depending on your particular requirements and recent frame history. Take it as you will. |
|
The host frame pacer is only there to ensure it doesn't capture too many frames, since sending a 60fps stream to a 59.94fps client means the client has to drop a frame a few times a minute. That results in stutter that no amount of client frame pacing can fix. That's the only hill I'm dying on. :) Hey, didn't you write that frame pacing code? I think it's great, it just needs to work on all platforms, that's the main issue right? |
|
As an aside, I'd also recommend testing the master branch with I'm not fully convinced that my retry logic is fully correct; it currently rejects frames that have identical seq and pts, but there are also cases where the pts is identical but seq differs (on kwin). There's still some infrequent jitter appearing on my hardware when testing this branch and also master too. |
|
I built your patch on my Cachy system with a GTX 1070 in it. It appears to work very well, and I think the min_fps_target works as expected. I noticed your timer uses a timeout of 1000ms whereas the Windows code uses 200ms. I guess this is why the minimum is more fuzzy in Windows. I added a few lines for Xbox support and I'm able to play Hades 2 at 1440p120 smooth as butter. If I had a faster GPU in this box you could fool me that it's Windows. Here are my changes, they are very simple. Obviously every other backend will need to adopt our changes for variable framerate, I'm interested in what everyone thinks about that. What's the tier list for the 5 capture methods, are some likely to be retired? I'm surprised to see NvFBC alive and kicking. cc @ns6089 @ReenigneArcher @cgutman (note the nice flat host frametime graph, and 119.88 on the nose. That's what I like to see.) |
I'm okay with changing the default. The old default was 1 which is equivalent to 10, but there was a bug in the code where it would never use the client requested framerate (config.framerate). old code: // set minimum frame time, avoiding violation of client-requested target framerate
auto minimum_frame_time = std::chrono::milliseconds(1000 / std::min(config.framerate, (config::video.min_fps_factor * 10)));
BOOST_LOG(debug) << "Minimum frame time set to "sv << minimum_frame_time.count() << "ms, based on min fps factor of "sv << config::video.min_fps_factor << "."sv;The client could be set to 30, 60, 90, 120., While the config::video.min_fps_factor could have been 1, 2, or 3... resulting in 10, 20, or 30. So all of the client side options were greater or equal to the setting in Sunshine. |
|
The Min FPS threshold is definitely a problem for me, but it's a pain in the arse to reproduce easily. The most reliable way I've found is this config:
The behaviour with master (in summary):
The behaviour is as follows with this PR:
Given the above behaviour, I believe that the min fps target is not working robustly when the capture method pushes frames at variable rates (which is what happens when snapshot starts blocking to filter duplicates). "Disabling" the min fps target completely alleviates both types of bursting with this PR, but it has the expected downside of making 30fps content on a 60fps stream less smooth, as the stream will push 30fps when in-game at 30fps, but mouse movement will trigger new frames past that rate, which causes the stream rate to bounce between 30-60. |
I tested this; I'm certainly not against it, but I don't think it will impact the specific issue I'm having on my configuration. When connecting to the stream it will use the updated calculation but then falls back to the original: |
I think when you see this in the log, it's from the "// Testing for available encoders" part of the log, which might be setting the X100 field in the config. (I kinda hate the testing section, it makes skimming the log a lot harder than it should be.) |
7d00aae to
65e7794
Compare
|
I added a commit to resolve performance when using the software encoder, as it's somewhat related to this change. The stream reports 60fps for software encoding on master branch, but it's actually closer to 45fps when accounting for duplicated buffers. With reactive capture you can now see the true capture rate, so I've added the change that fixes performance, and also included some fixes for crashing on mode change that only happen on the MemPtr path. Slightly OT: I tested the changes on Kwin and GNOME to confirm they work as expected, but it also came to my attention that GNOME freezes the stream when entering a fullscreen game with software encoding (with this PR and master branch). It's caused by a direct scanout conflict that can be worked around by installing the "Disable Unredirect" extension. When I have time I'll see if this can be worked around in Sunshine itself. |
I added your change and assigned you as the author - hopefully you don't mind. We can back these changes out and split into another PR if necessary. |
…ction Cherry-picked from LizardByte/Sunshine PR LizardByte#4740. - Replace polling-based frame capture with event-driven approach using condition variables - Optimize software encoding path with private local buffer to prevent buffer starvation - Add frame_mutex protection to fix modesetting crashes - Support x-nv-video[0].clientRefreshRateX100 for fractional NTSC framerates
Bundle ReportBundle size has no change ✅ |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4740 +/- ##
==========================================
- Coverage 15.45% 15.36% -0.09%
==========================================
Files 97 97
Lines 20582 20653 +71
Branches 9446 9470 +24
==========================================
- Hits 3181 3174 -7
- Misses 12854 15329 +2475
+ Partials 4547 2150 -2397
Flags with carried forward coverage won't be shown. Click here to find out more.
|
42ce0f5 to
8e04f8d
Compare
…ction Replaces polling-based frame capture with event-driven approach using condition variables for immediate notification when new frames arrive. Changes: - on_process: Notify snapshot() via condition variable on new buffer - fill_img: Extract and pass through PipeWire seq/pts metadata - snapshot: Block on condition variable with timeout; detect and skip duplicate frames using seq/pts metadata to avoid redundant processing - capture: Simplify catch-up logic to maintain timeline consistency by advancing in delay intervals without resetting origin This eliminates tight polling loops and reduces unnecessary frame processing when compositor reuses buffers.
8e04f8d to
5924995
Compare
|
I refreshed the original patch to clear the SonarQube warnings. I'm fairly happy with the state of the PR right now. I do have an additional change that leverages the pts metadata to let us calculate more accurate frame_timestamps, but I'd prefer to change that in another PR, as the pts data from mutter vs kwin is different to each other and thus needs to be handled carefully. |
…shes
This patch addresses a performance bottleneck in the software encoding
path and a race condition during PipeWire stream re-initialization.
1. Performance Optimization (Double-Buffering):
- Implemented a front/back buffer swapping strategy for the software
(MemPtr) path.
- Frame data is copied to a `back_buffer` without holding the
`frame_mutex`, then swapped with the `front_buffer` under a
short-lived lock.
- This prevents memory bandwidth contention and allows the encoder
and PipeWire threads to work in parallel without blocking.
2. Stability Fixes:
- Implemented `frame_mutex` protection across `on_process`, `fill_img`,
and `cleanup_stream` to prevent use-after-free and dangling pointers
during resolution changes.
- Enhanced `fill_img` to safely bail out if the stream is dead or
no new frame is ready.
- Added explicit state resets (`frame_ready = false`, `current_buffer = nullptr`)
to ensure the encoder thread ignores stale data during re-init.
Fixes intermittent segfaults during resolution changes and restores
performance on systems using software encoding.
…r requesting fractional NTSC framerates
5924995 to
c074fcf
Compare
|





Description
Replaces polling-based frame capture with event-driven approach using condition variables for immediate notification when new frames arrive.
Changes:
This eliminates tight polling loops and reduces unnecessary frame processing when compositor reuses buffers.
Screenshot
Issues Fixed or Closed
Roadmap Issues
Type of Change
Checklist
AI Usage