feat(net): optimize disconnectRandom by tracking block receive time#6704
feat(net): optimize disconnectRandom by tracking block receive time#6704xxo1shine wants to merge 1 commit intotronprotocol:developfrom
Conversation
|
|
||
| @Setter | ||
| @Getter | ||
| private volatile long blockRcvTimeCmp; |
There was a problem hiding this comment.
[QUESTION]Why does blockRcvTimeCmp live in PeerConnection?
Its only use is inside ResilienceService.getRandomDisconnectionPeers() to snapshot blockRcvTime before sorting, so the comparator reads a stable value even if another thread updates blockRcvTime mid-sort. That's a valid concern, but it's an implementation detail of one method in one service — it doesn't describe any property of the peer itself.
A local snapshot map achieves the same safety without polluting the peer domain object:
private List<PeerConnection> getRandomDisconnectionPeers(List<PeerConnection> peers) {
Map<PeerConnection, Long> snapshot = new IdentityHashMap<>();
peers.forEach(p -> snapshot.put(p, p.getBlockRcvTime()));
List<PeerConnection> sorted = new ArrayList<>(peers);
sorted.sort(Comparator.comparingLong(snapshot::get));
return sorted.subList(0, sorted.size() / 2);
}Is there a reason this needs to be a field rather than a method-local variable?
There was a problem hiding this comment.
The reason blockRcvTimeCmp was introduced as a field was to solve a concrete problem: blockRcvTime is updated concurrently by network threads during the sort, which causes the Comparator to observe inconsistent values across comparisons and triggers IllegalArgumentException: Comparison method violates its general contract — failing the sort. Making it a field on the peer was the way I picked at the time to give the comparator a stable snapshot value to read.
That said, your direction is right — a stable snapshot is purely an implementation detail of getRandomDisconnectionPeers and shouldn't leak onto PeerConnection. Switching to a method-local IdentityHashMap<PeerConnection, Long> populated once before the sort and read-only during the sort gives the same stability guarantee without polluting the domain model.
There was a problem hiding this comment.
I've optimized it according to your suggestions.
…er peer - Add blockRcvTime/blockRcvTimeCmp fields to PeerConnection to track when a peer last delivered a valid block - Set blockRcvTime in BlockMsgHandler after each block is received - Fix lastInteractiveTime update in InventoryMsgHandler: only update for block inventories above current head block num, preventing attackers from forging activity via stale block hashes - Add getRandomDisconnectionPeers() to ResilienceService: narrows the disconnect candidate pool to the oldest half by blockRcvTime, so peers that recently delivered blocks are protected from random eviction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cdf1047 to
a912a49
Compare
| if (type.equals(InventoryType.BLOCK) && peer.getAdvInvSpread().getIfPresent(item) == null) { | ||
| peer.setLastInteractiveTime(System.currentTimeMillis()); | ||
| long headNum = tronNetDelegate.getHeadBlockId().getNum(); | ||
| if (new BlockId(id).getNum() > headNum) { |
There was a problem hiding this comment.
[QUESTION] Small wording suggestion. The PR description says this prevents "attackers from forging activity via stale block hashes" — the stale-replay part is correct and a real improvement, but I don't think arbitrary forgery is prevented:
BlockId(Sha256Hash) reinterprets the first 8 bytes of the hash as a big-endian long (BlockCapsule.java:347-352) without verifying the hash matches any real block. An adversary can craft a 32-byte hash whose high bytes encode headNum + 1 and still pass the check.
The check still has clear value (this is the only path that refreshes lastInteractiveTime on INV — P2pEventHandlerImpl.updateLastInteractiveTime doesn't include INVENTORY in its switch, so without this, tightening any historical block hash refreshes the inactivity clock). Just suggest narrowing the description to "prevent stale block hashes from refreshing lastInteractiveTime" rather than the broader "forging activity" framing — keeps expectations accurate.
There was a problem hiding this comment.
This modification doesn't completely solve the update problem, but rather increases the cost of malicious activity, preventing attackers from refreshing the time using expired block hashes—an attack with very low cost.
The attack you mentioned does exist, but attackers would need to construct malicious blocks, significantly increasing the cost of such an attack.
| long headNum = tronNetDelegate.getHeadBlockId().getNum(); | ||
| if (new BlockId(id).getNum() > headNum) { | ||
| peer.setLastInteractiveTime(System.currentTimeMillis()); | ||
| } |
There was a problem hiding this comment.
[SHOULD] Good comparing. If new BlockId(id).getNum() < solidity Num, we should set the LastInteractiveTime as very old time.
There was a problem hiding this comment.
If the time isn't refreshed, the time will fall behind, so the added logic isn't very significant. It's also possible that the other party, due to network latency, is slightly behind and might be mistakenly affected.
There was a problem hiding this comment.
Let's improve it, suppose if new BlockId(id).getNum() < solidityNum - THRESHOLD, it takes into account the impact of network latency under extreme conditions. Since you’ve modified this part, it’s recommended to imporove the logic here.
| peers.forEach(p -> snapshot.put(p, p.getBlockRcvTime())); | ||
| List<PeerConnection> sorted = new ArrayList<>(peers); | ||
| sorted.sort(Comparator.comparingLong(snapshot::get)); | ||
| return sorted.subList(0, sorted.size() / 2); |
There was a problem hiding this comment.
[SHOULD] There are two problems:
- Exclude latest peer is enough, not half of peers.
- Is there any necessaary to create new Map instead of sorting by peer's blockRcvTime ?
There was a problem hiding this comment.
The main purpose is to increase randomness; things that are too certain are more vulnerable to attack. If not sorted by blockRcvTime, then what should be used for sorting?
What does this PR do?
Optimize disconnectRandom by tracking block receive time peer
Fixes #6504
Why are these changes required?
This PR has been tested by:
Follow up
Extra details