Skip to content

fix(dgw): fix adaptive frame skipping during EOF waits in session shadowing#1678

Open
irvingoujAtDevolution wants to merge 1 commit intomasterfrom
fix-adaptive-frame-skip-eof-ratio
Open

fix(dgw): fix adaptive frame skipping during EOF waits in session shadowing#1678
irvingoujAtDevolution wants to merge 1 commit intomasterfrom
fix-adaptive-frame-skip-eof-ratio

Conversation

@irvingoujAtDevolution
Copy link
Contributor

@irvingoujAtDevolution irvingoujAtDevolution commented Feb 13, 2026

Fixes playback failures in session shadowing where adaptive frame skipping permanently stalls video streaming after EOF retry waits.

The ratio used for adaptive frame skip decisions (media_time / wall_time) was corrupted by time spent waiting for the recording file to grow during EOF retries. A 20-second EOF wait would result in a ratio of ~0.006, far below the 1.0 threshold, causing all subsequent frames to be decode-only with no encoded output reaching the player.

The fix changes the ratio to only measure active processing time (media_time / accumulated_processing_time) instead of total wall time, excluding idle waits. Additionally, EOF waits now time out after 3 seconds to limit the maximum stall duration.

Issue: DGW-341

@github-actions
Copy link

Let maintainers know that an action is required on their side

  • Add the label release-required Please cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module) when you request a maintainer to cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module)

  • Add the label release-blocker Follow-up is required before cutting a new release if a follow-up is required before cutting a new release

  • Add the label publish-required Please publish libraries (`Devolutions.Gateway.Utils`, OpenAPI clients, etc) when you request a maintainer to publish libraries (Devolutions.Gateway.Utils, OpenAPI clients, etc.)

  • Add the label publish-blocker Follow-up is required before publishing libraries if a follow-up is required before publishing libraries

…dowing

Fixes playback failures in session shadowing where adaptive frame skipping
permanently stalls video streaming after EOF retry waits.

The ratio used for adaptive frame skip decisions (media_time / wall_time)
was corrupted by time spent waiting for the recording file to grow during
EOF retries. A 20-second EOF wait would result in a ratio of ~0.006, far
below the 1.0 threshold, causing all subsequent frames to be decode-only
with no encoded output reaching the player.

The fix changes the ratio to only measure active processing time
(media_time / accumulated_processing_time) instead of total wall time,
excluding idle waits. Additionally, EOF waits now time out after 3 seconds
to limit the maximum stall duration.

Issue: DGW-341
@irvingoujAtDevolution irvingoujAtDevolution force-pushed the fix-adaptive-frame-skip-eof-ratio branch from a51ceca to 16733fc Compare February 13, 2026 20:36
let _ = tx.send(WhenEofControlFlow::Break);
},
_ = tokio::time::sleep(std::time::Duration::from_secs(3)) => {
info!("EOF wait timed out, retrying");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Probably not a good idea to use an info! log at this place. I would suggest trace! unless you think debug! is better

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in session shadowing where adaptive frame skipping would permanently stall video streaming after EOF retry waits. The issue occurred because the realtime ratio calculation (used to decide whether to skip frames) included idle time spent waiting for file growth, causing the ratio to drop far below the 1.0 threshold and triggering perpetual decode-only mode.

Changes:

  • Modified the realtime ratio calculation to track only active processing time, excluding EOF wait periods
  • Added a 3-second timeout to EOF waits to prevent indefinite stalls
  • Increased MAX_RETRY_COUNT from 3 to 25 to accommodate the new timeout-based retry strategy

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crates/video-streamer/src/streamer/tag_writers.rs Added processing_time tracking to measure only active processing (excluding idle waits), updated current_realtime_ratio to use processing_time instead of wall time, added Duration import
crates/video-streamer/src/streamer/mod.rs Added 3-second timeout to EOF waits, increased MAX_RETRY_COUNT from 3 to 25 to accommodate timeout-based retries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -99,7 +99,7 @@ pub fn webm_stream(
}
}

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MAX_RETRY_COUNT increase from 3 to 25 combined with the 3-second timeout per retry creates a maximum potential wait time of 75 seconds (25 × 3s). This seems excessive for a live streaming scenario. Consider either:

  1. Reducing MAX_RETRY_COUNT to a lower value (e.g., 10 for 30s max wait)
  2. Adding a comment explaining why 75 seconds is an acceptable maximum wait time
  3. Implementing a total elapsed time check instead of just a retry count

The original issue describes a 20-second EOF wait corrupting the ratio, but the fix allows up to 75 seconds of total wait time, which could still cause significant delays in live streaming scenarios.

Suggested change
// NOTE: MAX_RETRY_COUNT is intentionally set to 25. With the 3-second delay
// between retries, this yields a worst-case wait of 75 seconds before we
// give up on the current stream. This is acceptable in our live-streaming
// context because downstream components already enforce a stricter overall
// timeout, and tolerating longer temporary input stalls here reduces
// unnecessary reconnect churn on brief network or encoder hiccups.

Copilot uses AI. Check for mistakes.
@@ -371,11 +377,11 @@ where

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment to the current_realtime_ratio method to clarify that it measures the ratio of media time to processing time (excluding idle waits), not media time to wall time. This would help future maintainers understand the distinction, especially given that the fix specifically addresses the issue of idle time corrupting this ratio.

Example:

// Calculates the ratio of media time advanced to active processing time.
// This excludes idle time spent waiting for file growth during EOF retries,
// preventing temporary stalls from permanently corrupting the frame skip decision.
Suggested change
// Calculates the ratio of media time advanced to active processing time.
// This uses `processing_time`, which is intended to exclude idle waits (for example,
// time spent waiting for file growth during EOF retries), so that temporary stalls
// do not permanently corrupt the frame skip decision.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants