-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
As it says in https://docs.datastax.com/en/developer/java-driver/3.2/manual/retries/#retries-and-idempotence
- retrying in onReadTimeout is always safe, since by definition this error indicates that the query was a read, which didn’t mutate any data;
- similarly, onUnavailable is safe: the coordinator is telling us that it didn’t find enough replicas, so we know that it didn’t try to apply the query.
- onWriteTimeout is not safe: some replicas failed to reply to the coordinator in time, but they might still have applied the mutation;
- onRequestError is not safe either: the query might have been applied before the error occurred. In particular, an OperationTimedOutException could be caused by a network issue that prevented a successful response to come back to the client.
But looking into a code I see that it won't retry onUnavailable but in the same time it will on onWriteTimeout which is not safe:
https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L727-L764
https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/policies/retry/simple.lua#L38-L48
Also, it is not pretty clear where those timeouts come https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L752
So, I have a few questions:
- Could it be possible that retry logic is broken or I'm wrong?
- Is it safe to retry request when those unknown timeouts occur and from where they might come?
Currently, we switched off the retry mechanism by setting retry_on_timeout to false and max_retries to one.
vtupe
Metadata
Metadata
Assignees
Labels
No labels