Skip to content

NETCONF-RPC timeout during notification-replay burst in netopeer2-2.2.35 + sysrepo-3.3.10 #1776

@rydy

Description

@rydy

Hi ,
I am running a NETCONF server built with netopeer2-v2.2.35 and sysrepo-v3.3.10.
When a NETCONF client subscribes to a notification stream with a replay-start-time, the server correctly replays several minutes of stored notifications.
During this replay burst, other RPC operations from the same client occasionally time out (our timeout is 30 s).

...............................................
............Notification replay screen printing............
...............................................
[2025-12-26 12:50:06.178] [DBG]: LN: Session 1: Received message:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="13"><establish-subscription xmlns="urn:ietf:params:xml:ns:yang:ietf-subscribed-notifications"><stream-xpath-filter xmlns:ns="urn:asmote:gNB-topology:1.0">/ns:gNB-topology-change</stream-xpath-filter><stream>NETCONF</stream></establish-subscription></rpc>
...............................................
............Notification replay screen printing............
...............................................
[2025-12-26 12:50:38.179] [ERR]: SR: EV ORIGIN: SHM event "rpc" ID 3 processing timed out.
[2025-12-26 12:50:38.179] [WRN]: SR: EV ORIGIN: "/ietf-subscribed-notifications:establish-subscription" "rpc" ID 3 priority 0 failed (Timeout expired).
...............................................
............Notification replay screen printing............
...............................................

Two typical scenarios are shown below.

  1. RPC eventually succeeds but takes ~19 s.log
  2. RPC times out after 30 s.log

Analysis and question

After reading the netopeer2-server source code, I can see that RPC, data and notification subscriptions are created through independent sysrepo API calls, and each subscription owns its dedicated sr_shmsub_listen_thread.
Therefore RPC, data and notification callbacks should be concurrent.
Nevertheless, the logs above prove that while the long-running notification-replay is active, subsequent RPC operations are either delayed by ~19 s or time out after 30 s.
Could anyone explain
why this blocking happens despite the separate threads, and
what is the recommended way to eliminate or at least mitigate the problem (except simply increasing the RPC timeout)?
Any hints or pointers to the relevant code paths would be highly appreciated.
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions