Skip to content

Fix millis() wraparound in PacketQueue time comparisons#1795

Merged
ripplebiz merged 1 commit intomeshcore-dev:devfrom
DanielNovak:fix-packetqueue-millis-wraparound
Feb 23, 2026
Merged

Fix millis() wraparound in PacketQueue time comparisons#1795
ripplebiz merged 1 commit intomeshcore-dev:devfrom
DanielNovak:fix-packetqueue-millis-wraparound

Conversation

@DanielNovak
Copy link

Summary

PacketQueue::countBefore() and PacketQueue::get() use unsigned comparison to check if a scheduled packet is ready:

if (_schedule_table[j] > now) continue;   // scheduled for future

This breaks when millis() wraps around after ~49.7 days of continuous uptime. The same file's sibling class Dispatcher already handles this correctly via signed subtraction in millisHasNowPassed(), but PacketQueue was not using that pattern.

The bug

millis() is a 32-bit unsigned counter that wraps from 0xFFFFFFFF to 0 after ~49.7 days. When a packet is scheduled near the wrap point, futureMillis() computes scheduled_for = millis() + delay, which can produce a value on either side of the boundary.

Packet gets stuck — when scheduled_for doesn't wrap but millis() does:

millis() = 0xFFFF_F000, delay = 1000ms
scheduled_for = 0xFFFF_F3E8  (high value, no overflow)

millis() wraps → now = 0x0000_0100
Check: 0xFFFF_F3E8 > 0x0000_0100 → TRUE → packet stuck "in the future"

The packet remains in the queue until millis() reaches 0xFFFF_F3E8 again — another ~49.7 days.

Packet sent early — when scheduled_for wraps but millis() hasn't yet:

millis() = 0xFFFF_FFF0, delay = 500ms
scheduled_for = 0x0000_01E4  (wrapped to small value)

Next loop: now = 0xFFFF_FFF1
Check: 0x0000_01E4 > 0xFFFF_FFF1 → FALSE → packet sent immediately

The packet is sent ~500ms early, reducing the effectiveness of collision-avoidance and score-based RX delays.

Impact

Both the outbound send queue (retransmit delays, ACK delays) and inbound RX queue (score-based delays up to 32 seconds) are affected.

For a typical repeater, the probability of a packet being in the queue at the exact wrap moment is low on any single occurrence. But the RX delay queue has a wider window (up to 32 seconds for weak flood packets), and nodes like repeaters and room servers are expected to run for months continuously.

A stuck packet also permanently occupies a pool slot, causing a slow leak of the packet pool over successive wraps.

The fix

Replace the unsigned comparison with signed subtraction, matching the pattern already used by Dispatcher::millisHasNowPassed():

// before
if (_schedule_table[j] > now) continue;

// after
if ((int32_t)(_schedule_table[j] - now) > 0) continue;

This handles the wraparound correctly via 2's complement arithmetic, for time differences up to ~24.8 days in either direction — well beyond the maximum queue delay of 32 seconds.

Risk: None. The two approaches produce identical results for all time differences under 2^31 ms (~24.8 days). Same number of CPU instructions, same cycle count on all target architectures (ARM Cortex-M).

PacketQueue::countBefore() and PacketQueue::get() use unsigned
comparison (_schedule_table[j] > now) to check if a packet is
scheduled for the future. This breaks when millis() wraps around
after ~49.7 days: packets scheduled just before the wrap appear
to be in the far future and get stuck in the queue.

Use signed subtraction instead, matching the approach already used
by Dispatcher::millisHasNowPassed(). This correctly handles the
wraparound for time differences up to ~24.8 days in either
direction, well beyond the maximum queue delay of 32 seconds.
@ripplebiz ripplebiz merged commit 8ee4867 into meshcore-dev:dev Feb 23, 2026
@IoTThinks
Copy link
Contributor

IoTThinks commented Feb 28, 2026

@DanielNovak What if now=0xFFFFFFFF?
It should return _num?
#1687 (comment)

This is the calling method _mgr->getOutboundCount(0xFFFFFFFF)
image

DanielNovak added a commit to DanielNovak/MeshCore that referenced this pull request Feb 28, 2026
PR meshcore-dev#1795 changed PacketQueue::countBefore() to use signed 2's complement
arithmetic for millis wraparound safety. However, this broke the
0xFFFFFFFF sentinel pattern used by callers to mean "count all packets
regardless of schedule".

With the signed comparison, countBefore(0xFFFFFFFF) always returns 0,
causing hasPendingWork() to report false and repeaters to sleep with
packets still queued. Stats reporting also shows queue_len as 0.

Add an early-return for the sentinel value before the loop.
DanielNovak added a commit to DanielNovak/MeshCore that referenced this pull request Feb 28, 2026
PR meshcore-dev#1795 changed PacketQueue::countBefore() to use signed 2's complement
arithmetic for millis wraparound safety. However, this broke the
0xFFFFFFFF sentinel pattern used by callers to mean "count all packets
regardless of schedule".

With the signed comparison, countBefore(0xFFFFFFFF) always returns 0,
causing hasPendingWork() to report false and repeaters to sleep with
packets still queued. Stats reporting also shows queue_len as 0.

Add an early-return for the sentinel value before the loop, and document
the sentinel convention on the virtual interface and implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants