Fix double handoff for unloaded small messages by breakertt · Pull Request #84 · PlatformLab/HomaModule

breakertt · 2026-05-08T23:44:50Z

Fix double handoff for unloaded small messages

Resolved issue #77, we move the ACK block to after homa_add_packet + homa_rpc_handoff. The unlock is still there (still need it for homa_rpc_acked's lock ordering), but now the skb is on the queue before the unlock window opens. Anyone who grabs the lock during the window finds data, no clearing happens. Unit test modified correspondingly.

Impact on a CloudLab xl170 pair (25 GbE, Linux 6.17.8): server-side handoff_count / requests_received at unloaded 64 B drops from 1.148 to 1.000 (5/5 trials, race closed). Loaded throughput across w1..w5 is unchanged within noise (Δ kops swings -4.4% to +3.5% with no consistent sign across workloads). More details below.

Root cause

homa_rpc_alloc_server sets RPC_PKTS_READY and fires homa_rpc_handoff for the first packet of a new server RPC, before anyone has actually put the skb on msgin.packets.

homa_data_pkt drops the bucket lock to call homa_rpc_acked() for any piggy-backed ACK. That happens before homa_add_packet. The unlock is mandatory, homa_rpc_acked needs to grab other RPCs' locks, so we can't hold this one.

softirq (holds bucket lock)        recvmsg side (woken up, blocked on lock)
-------------------------          ----------------------------------------
alloc_server: set RPC_PKTS_READY
              homa_rpc_handoff --wake-->
                                   wait_shared -> pulls rpc -> tries lock
homa_data_pkt:
   ack: rpc_unlock           --------->  gets the lock
        homa_rpc_acked              homa_copy_to_user
                                      skb_peek == NULL  <-- empty queue
                                      clear_bit(RPC_PKTS_READY)
                                      break
                                    rpc_unlock -> goes back to sleep
        rpc_lock              <----
   homa_add_packet (skb finally on the queue)
   set RPC_PKTS_READY (was just cleared)
   homa_rpc_handoff       --wake--> (second wake; this one delivers)

More details on fix measurement

Two CloudLab xl170 nodes (E5-2640 v4 @ 2.40 GHz, 20 logical cores, 25 GbE Mellanox), small-lan profile, both on Linux 6.17.8 mainline (the version the upstream README says works).

For each branch (fix-handoff-twice-reproduce for baseline metrics; fix-handoff-twice with the metric overlay for the fix):

Patch cloudlab/bin/config's VLAN regex inet 10\.0\.1\. -> inet 10\.10\.1\. (current small-lan uses the latter).
make all && cd util && make cp_node.
For each cell:
- Unloaded 64 B: client --ports 1 --port-receivers 0 --client-max 1 --workload 64; server --ports 1 --port-threads 1. 5 x ~10 s.
- Loaded w1..w5: cperf 25 Gbps defaults, client --ports 3 --port-receivers 3 --client-max 200 --gbps 0; server --ports 3 --port-threads 3. 5 x 30 s.
Reload homa.ko + run cloudlab/bin/config homa <ko> nic power rps between every trial. Without that, server-side state accumulates and contaminates loaded numbers (tested it; the variance is wild without per-trial reset).
Read /proc/net/homa_metrics after each trial, divide.

The probe is one INC_METRIC call at the top of homa_rpc_handoff plus a u64 field. ~16 LoC across 3 files + a 10-line shell helper. See fix-handoff-twice-reproduce.

For single-packet messages each RPC should see exactly one handoff, so ratio > 1 is the race signal. Loaded ratios aren't reported below; for messages larger than one MTU each packet that lands after the receiver drained the queue legitimately fires its own wake-up, so the metric stops measuring the race.

Unloaded 64 B

	kops	P50 µs	P99 µs	ratio
baseline	50.27	17.96	38.81	1.148
fix	49.99	18.46	37.91	1.000

Race closed. Latency / throughput delta is within the per-trial jitter (~0.5 µs, ~3% kops trial-to-trial).

Loaded, cperf 25 Gbps defaults

workload	baseline kops	fix kops	Δ kops	bP50	fP50	bP99	fP99
w1	404.50	386.56	-4.4 %	449	492	540	581
w2	378.76	386.57	+2.1 %	476	468	612	608
w3	309.07	319.97	+3.5 %	568	529	866	812
w4	32.94	32.45	-1.5 %	104	104	197 ms	203 ms
w5	4.81	4.68	-2.7 %	23.8 ms	23.7 ms	136 ms	150 ms

(P50/P99 in µs unless marked.) Δ swings -4.4 to +3.5% with no consistent sign across workloads, that's noise. Within-variant variance is comparable: baseline-w1's 5 trials span 385.93-413.60 (7%), fix-w1 spans 370.52-411.32 (11%). The cross-variant deltas are smaller than that.

Fix double handoff for unloaded small messages

fd7956e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix double handoff for unloaded small messages#84

Fix double handoff for unloaded small messages#84
breakertt wants to merge 1 commit intoPlatformLab:mainfrom
breakertt:fix-handoff-twice

breakertt commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breakertt commented May 8, 2026

Fix double handoff for unloaded small messages

Root cause

More details on fix measurement

Unloaded 64 B

Loaded, cperf 25 Gbps defaults

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant