Skip to content

Comments

feat(linux/xdgportal): implement reactive capture with duplicate detection#4740

Open
psyke83 wants to merge 3 commits intoLizardByte:masterfrom
psyke83:xdg_snapshottimeout
Open

feat(linux/xdgportal): implement reactive capture with duplicate detection#4740
psyke83 wants to merge 3 commits intoLizardByte:masterfrom
psyke83:xdg_snapshottimeout

Conversation

@psyke83
Copy link
Contributor

@psyke83 psyke83 commented Feb 18, 2026

Description

Replaces polling-based frame capture with event-driven approach using condition variables for immediate notification when new frames arrive.

Changes:

  • on_process: Notify snapshot() via condition variable on new buffer
  • fill_img: Extract and pass through PipeWire seq/pts metadata
  • snapshot: Block on condition variable with timeout; detect and skip duplicate frames using seq/pts metadata to avoid redundant processing
  • capture: Simplify catch-up logic to maintain timeline consistency by advancing in delay intervals without resetting origin

This eliminates tight polling loops and reduces unnecessary frame processing when compositor reuses buffers.

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

@psyke83 psyke83 marked this pull request as ready for review February 18, 2026 08:11
@psyke83
Copy link
Contributor Author

psyke83 commented Feb 18, 2026

TLDR; test with minimum_fps_target = 1. I'd appreciate some testing feedback before merging - thanks.

It's vital to ~disable (set to 1 second timeout) the "Minimum FPS target" setting in order for XDG Portal capture pacing to work as expected. With the default setting of 0 (half framerate), this setting can randomly cause 50% or more extra frames (60fps -> 90fps) to be inserted, which causes network jitter that completely breaks pacing. I believe this may be happening because the arrival of pipewire buffers is not perfectly aligned with the presentation on screen, which confuses the encoder thread more than kmsgrab, which likely grabs frames at virtually identical intervals.

Results with RX 6600 on Fedora 43: with the above config set appropriately, pacing is perfect on KDE and GNOME. With vsync disabled and the game capped to 60fps, sometimes there may be some slight jitter that clears up after a few seconds, but I notice the same issue with kmsgrab. Enabling vsync seems to prevent the random jitter, likely due to the compositor pacing the frames closer to the vblank interval. I need to test more to be absolutely sure it's bulletproof, but my current impression is that portal grab pacing is on par with or possibly better than kmsgrab.

@ReenigneArcher
Copy link
Member

@psyke83 This feature also doesn't work quite right on Windows, and is the source of a lot of confusion over the years.

@andygrundman tagging you as FYI, because you've expressed interest in improving this feature in the past.

@ReenigneArcher ReenigneArcher added this to the xdg portal grab milestone Feb 18, 2026
@andygrundman
Copy link
Contributor

andygrundman commented Feb 18, 2026

It would be fantastic if Sunshine behaved the same as it does on Windows when it comes to capturing frames. See src/platform/windows/display_base.cpp (search for the word pacing). The Sunshine architecture is not very good about defining how capture works, because I think the Windows logic should probably be refactored into a higher base class that can be used by all platforms.

When I say "like Windows", what I mean specifically is:

  • Captures frames using a frame pacing timer based on the stream framerate and optionally a client-requested fractional framerate for fine-tuning. This is used on Xbox which runs at NTSC refresh rates. Frames captured must have an accurate frame_timestamp indicating when the frame was captured.
  • The capture process will naturally capture duplicate frames, and these should not be streamed, until the framerate drops below min_fps_target. To keep the framerate above the minimum, duplicate frames are sent as needed. Duplicate frames should have their frame_timestamp removed.
  • This timer needs to support the framerate requested in x-nv-video[0].clientRefreshRateX100, you can use the Xbox client for testing this. There are 2 special case NTSC framerates that use defined intervals. Multiples of 29.97, and 23.976, and for every other value it's a simple 1/fps.
  • Capture should have no relation to the host's refresh rate or vsync timing.

I have tried to argue for the min_fps_target default to be a value like 20, which would allow streaming of 24fps, 30fps, content without duplicate frames. The 0 value is very unintuitive and wasteful. The old value was more like 10, and there will be a tradeoff here with bursty data when the stream suddenly ramps back up to max, moving the mouse is where some people complain about this burst.

@ns6089
Copy link
Contributor

ns6089 commented Feb 18, 2026

Capture should have no relation to the host's refresh rate or vsync timing

I won't be dying on this hill, but please don't do this. Capture should provide the lowest possible latency within given constraints. Yes, this requires frame pacing on host, but not extremely tight frame pacing. Client still has to implement its own and proper frame pacing anyway to deal with frame transfer time jitter (not ping jitter, caused by variation in encoded frame sizes), and encoding time jitter (caused by gpu scheduling outliers). And demanding tight host-side frame pacing will only produce worse latency in the outliers on the client.

A generic frame pacer looks like this. For each frame, there's three timepoints:

  1. Instant after — frame arriving/timestamped after this point are instantly processed, frames arriving/timestamped before are subjected to (2)
  2. Delay until — frame arriving/timestamped before (1) are delayed until this point. Delayed frames can get preempted by instant frames and be subsequently dropped.
  3. Repeat at — if there's no new frames at this point, the old frame is repeated

Then these points are dynamically adjusted depending on your particular requirements and recent frame history. Take it as you will.

@andygrundman
Copy link
Contributor

andygrundman commented Feb 19, 2026

The host frame pacer is only there to ensure it doesn't capture too many frames, since sending a 60fps stream to a 59.94fps client means the client has to drop a frame a few times a minute. That results in stutter that no amount of client frame pacing can fix. That's the only hill I'm dying on. :)

Hey, didn't you write that frame pacing code? I think it's great, it just needs to work on all platforms, that's the main issue right?

@psyke83
Copy link
Contributor Author

psyke83 commented Feb 19, 2026

As an aside, I'd also recommend testing the master branch with minimum_fps_target = 1.

I'm not fully convinced that my retry logic is fully correct; it currently rejects frames that have identical seq and pts, but there are also cases where the pts is identical but seq differs (on kwin). There's still some infrequent jitter appearing on my hardware when testing this branch and also master too.

@andygrundman
Copy link
Contributor

I built your patch on my Cachy system with a GTX 1070 in it. It appears to work very well, and I think the min_fps_target works as expected. I noticed your timer uses a timeout of 1000ms whereas the Windows code uses 200ms. I guess this is why the minimum is more fuzzy in Windows.

I added a few lines for Xbox support and I'm able to play Hades 2 at 1440p120 smooth as butter. If I had a faster GPU in this box you could fool me that it's Windows. Here are my changes, they are very simple. Obviously every other backend will need to adopt our changes for variable framerate, I'm interested in what everyone thinks about that. What's the tier list for the 5 capture methods, are some likely to be retired? I'm surprised to see NvFBC alive and kicking.

cc @ns6089 @ReenigneArcher @cgutman

--- a/src/platform/linux/portalgrab.cpp
+++ b/src/platform/linux/portalgrab.cpp
@@ -1019,8 +1019,18 @@ namespace portal {
   class portal_t: public platf::display_t {
   public:
     int init(platf::mem_type_e hwdevice_type, const std::string &display_name, const ::video::config_t &config) {
+      // calculate frame interval we should capture at
       framerate = config.framerate;
-      delay = std::chrono::nanoseconds {1s} / framerate;
+      if (config.framerateX100 > 0) {
+        AVRational fps_strict = ::video::framerateX100_to_rational(config.framerateX100);
+        delay = std::chrono::nanoseconds(
+          (static_cast<int64_t>(fps_strict.den) * 1'000'000'000LL) / fps_strict.num
+        );
+        BOOST_LOG(info) << "Requested frame rate [" << fps_strict.num << "/" << fps_strict.den << ", approx. " << av_q2d(fps_strict) << " fps]";
+      } else {
+        delay = std::chrono::nanoseconds {1s} / framerate;
+        BOOST_LOG(info) << "Requested frame rate [" << framerate << "fps]";
+      }
       mem_type = hwdevice_type;

       if (get_dmabuf_modifiers() < 0) {

(note the nice flat host frametime graph, and 119.88 on the nose. That's what I like to see.)
hades2_linunx_pipewire_xbox

@ReenigneArcher
Copy link
Member

I have tried to argue for the min_fps_target default to be a value like 20, which would allow streaming of 24fps, 30fps, content without duplicate frames. The 0 value is very unintuitive and wasteful. The old value was more like 10, and there will be a tradeoff here with bursty data when the stream suddenly ramps back up to max, moving the mouse is where some people complain about this burst.

I'm okay with changing the default. The old default was 1 which is equivalent to 10, but there was a bug in the code where it would never use the client requested framerate (config.framerate).

old code:

    // set minimum frame time, avoiding violation of client-requested target framerate
    auto minimum_frame_time = std::chrono::milliseconds(1000 / std::min(config.framerate, (config::video.min_fps_factor * 10)));
    BOOST_LOG(debug) << "Minimum frame time set to "sv << minimum_frame_time.count() << "ms, based on min fps factor of "sv << config::video.min_fps_factor << "."sv;

The client could be set to 30, 60, 90, 120., While the config::video.min_fps_factor could have been 1, 2, or 3... resulting in 10, 20, or 30.

So all of the client side options were greater or equal to the setting in Sunshine.

@psyke83
Copy link
Contributor Author

psyke83 commented Feb 19, 2026

The Min FPS threshold is definitely a problem for me, but it's a pain in the arse to reproduce easily.

The most reliable way I've found is this config:

  • Fedora 43 KDE, Plasma 6.6 (copr beta repo), host GPU is RX 6600. I've tested a little bit on GNOME, but it seems that mutter is less susceptible to this issue.
  • Stream is set to 60fps, 1080p.
  • Host has a fullscreen game (e.g. Resident Evil 3) with vsync off but framerate capped in-game to 60fps. The game is not fully loading the GPU (average usage is ~75%).
  • Min FPS target is left to the default.

The behaviour with master (in summary):

  • Stream is often smooth, but at random intervals, the stream itself will slow down to 30fps, even though Moonlight's statistics don't reflect a deviation from 60fps. For brevity let's just call this the "30fps effect", which is simply the snapshot function grabbing the same buffer and therefore pushing duplicates to the encoder.
  • Changing the FPS limit, toggling vsync or swapping windowed -> fullscreen mode can temporarily alleviate the slowdown, but it always comes back.
  • Bursting / runaway bursting (where the stream violates the framerate cap) doesn't seem to occur.

The behaviour is as follows with this PR:

  • Stream is usually smooth at the beginning, but after a few minutes, you will notice some jitter appear. Moonlight's Incoming and Decoding rate will briefly violate the limit (~61-63) every few seconds which is accompanied by network jitter increasing to 3-5%. These small bursts alone impacts overall smoothness quite negatively.
  • After a (seemingly random) period of time, the jitter will worsen, which culminates in runaway bursting where the stream remains locked at 90fps (incoming and decoding) with constant network jitter of 50% or so.
  • The runaway bursting can be temporarily alleviated by changing the in-game FPS target to 30, 120, enabling vsync, or triggering a fullscreen -> windows -> fullscreen transition via alt+enter, but the pattern will always return.
  • With extra debugging I saw an erratic time variance between push_captured_image_cb calls; I believe when the runaway bursting occurred, it would proceed at an odd 8 -> 22 -> 8ms pattern instead of the steady ~16.6ms cadence that would be expected for a 60fps stream.
  • To calm the runaway bursting with this PR, reducing the snapshot timeout from 1-2ms helps, but it reintroduces the 30fps effect. Increasing even to 6-8ms alleviates the 30fps effect, but doesn't prevent eventual runaway bursting.

Given the above behaviour, I believe that the min fps target is not working robustly when the capture method pushes frames at variable rates (which is what happens when snapshot starts blocking to filter duplicates).

"Disabling" the min fps target completely alleviates both types of bursting with this PR, but it has the expected downside of making 30fps content on a 60fps stream less smooth, as the stream will push 30fps when in-game at 30fps, but mouse movement will trigger new frames past that rate, which causes the stream rate to bounce between 30-60.

@psyke83
Copy link
Contributor Author

psyke83 commented Feb 19, 2026

I built your patch on my Cachy system with a GTX 1070 in it. It appears to work very well, and I think the min_fps_target works as expected. I noticed your timer uses a timeout of 1000ms whereas the Windows code uses 200ms. I guess this is why the minimum is more fuzzy in Windows.

I added a few lines for Xbox support and I'm able to play Hades 2 at 1440p120 smooth as butter. If I had a faster GPU in this box you could fool me that it's Windows. Here are my changes, they are very simple. Obviously every other backend will need to adopt our changes for variable framerate, I'm interested in what everyone thinks about that. What's the tier list for the 5 capture methods, are some likely to be retired? I'm surprised to see NvFBC alive and kicking.

cc @ns6089 @ReenigneArcher @cgutman

--- a/src/platform/linux/portalgrab.cpp
+++ b/src/platform/linux/portalgrab.cpp
@@ -1019,8 +1019,18 @@ namespace portal {
   class portal_t: public platf::display_t {
   public:
     int init(platf::mem_type_e hwdevice_type, const std::string &display_name, const ::video::config_t &config) {
+      // calculate frame interval we should capture at
       framerate = config.framerate;
-      delay = std::chrono::nanoseconds {1s} / framerate;
+      if (config.framerateX100 > 0) {
+        AVRational fps_strict = ::video::framerateX100_to_rational(config.framerateX100);
+        delay = std::chrono::nanoseconds(
+          (static_cast<int64_t>(fps_strict.den) * 1'000'000'000LL) / fps_strict.num
+        );
+        BOOST_LOG(info) << "Requested frame rate [" << fps_strict.num << "/" << fps_strict.den << ", approx. " << av_q2d(fps_strict) << " fps]";
+      } else {
+        delay = std::chrono::nanoseconds {1s} / framerate;
+        BOOST_LOG(info) << "Requested frame rate [" << framerate << "fps]";
+      }
       mem_type = hwdevice_type;

       if (get_dmabuf_modifiers() < 0) {

(note the nice flat host frametime graph, and 119.88 on the nose. That's what I like to see.) hades2_linunx_pipewire_xbox

I tested this; I'm certainly not against it, but I don't think it will impact the specific issue I'm having on my configuration. When connecting to the stream it will use the updated calculation but then falls back to the original:

Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.624]: Info: Requested frame rate [60/1, approx. 60 fps]
Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.745]: Info: Requested frame rate [60fps]

@andygrundman
Copy link
Contributor

I tested this; I'm certainly not against it, but I don't think it will impact the specific issue I'm having on my configuration. When connecting to the stream it will use the updated calculation but then falls back to the original:

Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.624]: Info: Requested frame rate [60/1, approx. 60 fps]
Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.745]: Info: Requested frame rate [60fps]

I think when you see this in the log, it's from the "// Testing for available encoders" part of the log, which might be setting the X100 field in the config. (I kinda hate the testing section, it makes skimming the log a lot harder than it should be.)

@psyke83 psyke83 force-pushed the xdg_snapshottimeout branch from 7d00aae to 65e7794 Compare February 20, 2026 08:38
@psyke83
Copy link
Contributor Author

psyke83 commented Feb 20, 2026

I added a commit to resolve performance when using the software encoder, as it's somewhat related to this change. The stream reports 60fps for software encoding on master branch, but it's actually closer to 45fps when accounting for duplicated buffers. With reactive capture you can now see the true capture rate, so I've added the change that fixes performance, and also included some fixes for crashing on mode change that only happen on the MemPtr path.

Slightly OT: I tested the changes on Kwin and GNOME to confirm they work as expected, but it also came to my attention that GNOME freezes the stream when entering a fullscreen game with software encoding (with this PR and master branch). It's caused by a direct scanout conflict that can be worked around by installing the "Disable Unredirect" extension. When I have time I'll see if this can be worked around in Sunshine itself.

@psyke83
Copy link
Contributor Author

psyke83 commented Feb 20, 2026

I tested this; I'm certainly not against it, but I don't think it will impact the specific issue I'm having on my configuration. When connecting to the stream it will use the updated calculation but then falls back to the original:

Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.624]: Info: Requested frame rate [60/1, approx. 60 fps]
Feb 19 16:37:22 ryzenl sunshine[11629]: [2026-02-19 16:37:22.745]: Info: Requested frame rate [60fps]

I think when you see this in the log, it's from the "// Testing for available encoders" part of the log, which might be setting the X100 field in the config. (I kinda hate the testing section, it makes skimming the log a lot harder than it should be.)

I added your change and assigned you as the author - hopefully you don't mind. We can back these changes out and split into another PR if necessary.

neatnoise added a commit to neatnoise/Sunshine that referenced this pull request Feb 20, 2026
…ction

Cherry-picked from LizardByte/Sunshine PR LizardByte#4740.

- Replace polling-based frame capture with event-driven approach using condition variables
- Optimize software encoding path with private local buffer to prevent buffer starvation
- Add frame_mutex protection to fix modesetting crashes
- Support x-nv-video[0].clientRefreshRateX100 for fractional NTSC framerates
@codecov
Copy link

codecov bot commented Feb 20, 2026

Bundle Report

Bundle size has no change ✅

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 0% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 15.36%. Comparing base (4913b67) to head (42ce0f5).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/platform/linux/portalgrab.cpp 0.00% 88 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4740      +/-   ##
==========================================
- Coverage   15.45%   15.36%   -0.09%     
==========================================
  Files          97       97              
  Lines       20582    20653      +71     
  Branches     9446     9470      +24     
==========================================
- Hits         3181     3174       -7     
- Misses      12854    15329    +2475     
+ Partials     4547     2150    -2397     
Flag Coverage Δ
Archlinux 11.05% <0.00%> (-0.06%) ⬇️
FreeBSD-14.3-aarch64 ?
FreeBSD-14.3-amd64 13.05% <0.00%> (-0.09%) ⬇️
Homebrew-ubuntu-22.04 13.26% <0.00%> (-0.07%) ⬇️
Linux-AppImage 11.47% <0.00%> (-0.06%) ⬇️
Windows-AMD64 13.49% <ø> (ø)
Windows-ARM64 12.34% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/platform/linux/graphics.h 0.00% <ø> (ø)
src/platform/linux/portalgrab.cpp 3.54% <0.00%> (-0.36%) ⬇️

... and 55 files with indirect coverage changes

@psyke83 psyke83 force-pushed the xdg_snapshottimeout branch from 42ce0f5 to 8e04f8d Compare February 20, 2026 16:07
…ction

Replaces polling-based frame capture with event-driven approach using
condition variables for immediate notification when new frames arrive.

Changes:
- on_process: Notify snapshot() via condition variable on new buffer
- fill_img: Extract and pass through PipeWire seq/pts metadata
- snapshot: Block on condition variable with timeout; detect and skip
  duplicate frames using seq/pts metadata to avoid redundant processing
- capture: Simplify catch-up logic to maintain timeline consistency
  by advancing in delay intervals without resetting origin

This eliminates tight polling loops and reduces unnecessary frame
processing when compositor reuses buffers.
@psyke83 psyke83 force-pushed the xdg_snapshottimeout branch from 8e04f8d to 5924995 Compare February 20, 2026 16:39
@psyke83
Copy link
Contributor Author

psyke83 commented Feb 20, 2026

I refreshed the original patch to clear the SonarQube warnings. I'm fairly happy with the state of the PR right now.

I do have an additional change that leverages the pts metadata to let us calculate more accurate frame_timestamps, but I'd prefer to change that in another PR, as the pts data from mutter vs kwin is different to each other and thus needs to be handled carefully.

psyke83 and others added 2 commits February 20, 2026 17:59
…shes

This patch addresses a performance bottleneck in the software encoding
path and a race condition during PipeWire stream re-initialization.

1. Performance Optimization (Double-Buffering):
   - Implemented a front/back buffer swapping strategy for the software
     (MemPtr) path.
   - Frame data is copied to a `back_buffer` without holding the
     `frame_mutex`, then swapped with the `front_buffer` under a
     short-lived lock.
   - This prevents memory bandwidth contention and allows the encoder
     and PipeWire threads to work in parallel without blocking.

2. Stability Fixes:
   - Implemented `frame_mutex` protection across `on_process`, `fill_img`,
     and `cleanup_stream` to prevent use-after-free and dangling pointers
     during resolution changes.
   - Enhanced `fill_img` to safely bail out if the stream is dead or
     no new frame is ready.
   - Added explicit state resets (`frame_ready = false`, `current_buffer = nullptr`)
     to ensure the encoder thread ignores stale data during re-init.

Fixes intermittent segfaults during resolution changes and restores
performance on systems using software encoding.
@psyke83 psyke83 force-pushed the xdg_snapshottimeout branch from 5924995 to c074fcf Compare February 20, 2026 17:59
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants