Skip to content

IGNITE-27947 Fix rollback for client piggyback tx#7844

Open
tmgodinho wants to merge 11 commits intoapache:mainfrom
tmgodinho:ignite-27947
Open

IGNITE-27947 Fix rollback for client piggyback tx#7844
tmgodinho wants to merge 11 commits intoapache:mainfrom
tmgodinho:ignite-27947

Conversation

@tmgodinho
Copy link
Copy Markdown
Contributor

Opened a new commit following #7730 with a much simpler approach.

https://issues.apache.org/jira/browse/IGNITE-27947

What was done:

  • Implement error handling for unset transaction in ClientSql
  • Server-side:
    ** TX_ROLLBACK accpets a request id of the first request of a direct mapped TX. Request Id is encoded in the negative range of resourceId.
    ** First Request Ids per TX are tracked using a mapping from reqId to resourceId. Mappings are created when the TX is created and removed on rollback and on response sent.
    ** Update all the operations. RO ops have the same parameters just for consistency.
  • Client Side:
    ** Allow multiple onSent callbacks on the payload output object.
    ** Added information to ClientLazyTransaction about the first request in the TX. Updated via PayloadOutputChannel on successful request.
    ** Implemented TX_ROLLBACK based on firstReqId

* The first request is tracked for direct transactions.
* The first request id is passed on the Rollback message using the negative range of the resourceId field.
@tmgodinho
Copy link
Copy Markdown
Contributor Author

This should be a simpler implementation than the one on the previous approach.

@tmgodinho tmgodinho marked this pull request as ready for review March 20, 2026 19:21
metrics.requestsFailedIncrement();
}

firstReqToTxResMap.remove(requestId);
Copy link
Copy Markdown
Contributor

@ptupitsyn ptupitsyn Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we remove the mapping once the response is sent, but the client might have requested cancellation just before it received our response:

client: send request
server: firstReqToTxResMap.add
server: send response, `firstReqToTxResMap.remove`
client: cancel operation
server: returns an error?

Can we remove the firstReqToTxResMap later? For example, when the tx is cleaned up (removed from the resource registry)?

This should simplify client-side logic too, we can always rollback using the first request id.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Pavel,
Yeah, you are right.
The rollback message using the firstReqId may already be waiting to be processed when we remove here.
In this scenario, the server would respond with an error, but we are not interested in that response.
That's why we also attach to the actual transaction future from the server and execute the rollback using the normal method if necessary. If we received the first request response concurrently, we just resend the rollback using the normal way, even if the first rollback was already sent.
But I forgot to add this scenario to the tests. I had it in the first version but forgot to port it to this one.

Delaying the removal of the mapping is also possible, at the expense of slightly more complexity on the server-side. I thought about it during the impl. On the client-side it would probably be simpler as you said, since we would rely on the server response for the first rollback message.

I'll add the test and implement this to see how it looks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you recheck @ptupitsyn ??

** Postpone removal of firstReqToTx resource to after the transaction is commited/rolled back.
** Added test
tx.recordOperationFailure(err);
}

// We should reconcile this code with ClientTable. Should be the same.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do that as part of current PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also think it would be best.

public CompletableFuture<Void> rollbackAsync() {
var tx0 = tx;

// This is really fishy. It will probably let you reuse a transaction after calling a rollback :(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let's create a ticket to address this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmgodinho please share the ticket.

if (!tx0.isDone() && cancelled.compareAndSet(false, true)) {
return requestInfoFuture
.thenCompose(reqInfo ->
reqInfo.ch.serviceAsync(ClientOp.TX_ROLLBACK, w -> w.out().packLong(-reqInfo.firstReqId), r -> null)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocol compat: what will happen with an older server that does not understand negative resource ids?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, it will experience the old "blocking" behaviour.
The request will be sent to the server, the server will panic at:
https://github.com/apache/ignite-3/pull/7844/changes/BASE..f649f764f18e572f830208127faddbc566fc8a6f#diff-6e5ede08174517c940134dafdcb9270727ed3738dcb475582b48c017344896f2L59
The client will pick the error, and try to rollback using the old method once the tx information is known.

I assumed it was ok just to fix it for newer versions without breaking the previous ones.

PS: The error is something like, resource not found, nothing very relevant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a protocol feature flag for this. Avoid sending the request if the server does not understand it.

** Fixed type
** Updated javadoc comment on ClientInboundMessageHandler
** Added IGNITE-28405 ticket mention
…andling logic. (For the most part anyway)

** The code is still very hard, its just a little less duplicated.
@ptupitsyn ptupitsyn changed the title IGNITE-27947 Impossible to rollback client first transaction request IGNITE-27947 Fix rollback for client piggyback tx Apr 2, 2026
# Conflicts:
#	modules/client-common/src/main/java/org/apache/ignite/internal/client/proto/ProtocolBitmaskFeature.java
#	modules/client-handler/src/main/java/org/apache/ignite/client/handler/ClientHandlerModule.java
#	modules/client/src/main/java/org/apache/ignite/internal/client/TcpClientChannel.java
if (ch.protocolContext().isFeatureSupported(TX_ROLLBACK_USING_FIRST_REQUEST)) {
return ch.serviceAsync(ClientOp.TX_ROLLBACK, w -> w.out().packLong(-reqInfo.firstReqId), r -> null);
} else {
return NOT_SUPPORTED_FUTURE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for the user?

}

/**
* If the current request is the first request of a direct translation, add a listener to the {@link PayloadWriter}.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If the current request is the first request of a direct translation, add a listener to the {@link PayloadWriter}.
* If the current request is the first request of a direct tx, add a listener to the {@link PayloadWriter}.

kvView.put(tx1, key, val);
@ParameterizedTest
@MethodSource("org.apache.ignite.internal.client.ItThinClientTransactionsTest#killTestContextFactory")
public void testRollbackDoesNotBlock(KillTestContext ctx) throws InterruptedException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those new tests pass on the main branch, looks like we don't test the new functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants