Skip to content

Conversation

@ptlrs
Copy link
Contributor

@ptlrs ptlrs commented Feb 6, 2026

What changes were proposed in this pull request?

This PR originates from https://issues.apache.org/jira/browse/HDDS-14454 refactors the code to use ConcurrentHashMaps and remove the synchronized methods that are used in XceiverClientGrpc

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14571

How was this patch tested?

CI: https://github.com/ptlrs/ozone/actions/runs/21737324867

}
LOG.debug("Connecting to server : {}; nodes in pipeline : {}, ", dn, pipeline.getNodes());

ManagedChannel channel = createChannel(dn, port).build();
Copy link

@yandrey321 yandrey321 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather do something like
ManagedChannel channel = channels.computeIfAbsent(dn.getId(), createChannel(dn, port).build());

XceiverClientProtocolServiceGrpc.newStub(channel);
asyncStubs.put(dn.getID(), asyncStub);
channels.put(dn.getID(), channel);
XceiverClientProtocolServiceStub asyncStub = XceiverClientProtocolServiceGrpc.newStub(channel);
Copy link

@yandrey321 yandrey321 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And
asyncStubs.computeIfAbsent(dn.getID(), XceiverClientProtocolServiceGrpc.newStub(channel));

asyncStubs.put(dn.getID(), asyncStub);
channels.put(dn.getID(), channel);
XceiverClientProtocolServiceStub asyncStub = XceiverClientProtocolServiceGrpc.newStub(channel);
synchronized (this) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block would not be needed if we use atomic map operation in comments above,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be atomic if we want to use it to check the state without synchronized block

@adoroszlai
Copy link
Contributor

@yandrey321 please try to avoid "Add single comment", rather "Start review" and submit all comments in a single batch via "Review changes"

ptlrs added 2 commits February 9, 2026 18:39
…o improve channel management and reduce synchronization.
@ptlrs
Copy link
Contributor Author

ptlrs commented Feb 10, 2026

Thanks for the review @yandrey321. I have updated the PR. Could you please take a look again?

@ptlrs ptlrs requested a review from yandrey321 February 10, 2026 03:16
asyncStubs.put(dn.getID(), asyncStub);
channels.put(dn.getID(), channel);

asyncStubs.computeIfAbsent(dn.getID(), dnId -> XceiverClientProtocolServiceGrpc.newStub(channel));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to remove stub if associated channel was terminated otherwise here the old stub would be reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async stub is being cleared above. Won't that be sufficient?

channels.computeIfPresent(dn.getID(), (dnId, channel) -> {
      if (channel.isTerminated() || channel.isShutdown()) {
        asyncStubs.remove(dnId);                    // clearing the stubs here
        return null; // removes from channels map
      }

      return channel;
    });

https://github.com/apache/ozone/pull/9718/changes#diff-bb4105305530e6076ed3af7c2463508e69d9bb02d146762dbc505d920734134bR184-R191

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that part

.collect(Collectors.toSet());
LOG.warn("Channels {} did not terminate within timeout.", failedChannels);

channels.keySet().removeIf(e -> !failedChannels.contains(e));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to wipe channels and stubs from the map even if they timed out on shutdown. We dont have retry mechanism to close them afterwards and we cannot reuse channels after calling shutdown on them.

Copy link

@yandrey321 yandrey321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ptlrs ptlrs marked this pull request as ready for review February 10, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants