MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error#4658
Conversation
andrelkin
left a comment
There was a problem hiding this comment.
Hi!
Thanks for the patch, please check my two notes.
More to that, find out what possible tests cover these changes, and if none, please add them too.
Cheers,
Andrei
sql/log_event_server.cc
Outdated
| inline int ignored_error_code(int err_code) | ||
| { | ||
| // Force slave to stop on this error even if in --slave-skip-errors | ||
| if (err_code == ER_CONNECTION_KILLED) |
There was a problem hiding this comment.
While this is a possible place to mask ER_CONNECTION_KILLED error, it is less specific than
concurrency_error_code() to which ER_CONNECTION_KILLED belong just naturally.
Please consider that.
There was a problem hiding this comment.
At any rate the error-to-skip init_slave_skip_errors() function needs a warning about the supplied ER_CONNECTION_KILLED will be ignored.
There was a problem hiding this comment.
I checked concurrency_error_code(), and I think it is better than ignored_error_code(), but I need to ask about the function name, the ER_CONNECTION_KILLED, is not a concurrency error, right?
Is it good to add a connection error to concurrency errors?
There was a problem hiding this comment.
And I will check the tests and add a warning message.
Thanks a lot :)
There was a problem hiding this comment.
ER_CONNECTION_KILLED is a sort of concurrency error in the slave parallel setup. Yet its semantic is not about error, at facing it the parallel slave won't stop, rather it will retry.
However, let me adjust my yesterday point and by that to simplify this work.
Along with the init_slave_skip_errors() warning let's simply skip setting its bit in slave_error_mask.
Notice the single-threaded "legacy" slave is fine with this. There is nothing wrong with a requirement your patch would impose: if its lonely thread is connection-killed, it then must exit.
There was a problem hiding this comment.
I made the fix in init_slave_skip_errors() and added a warning in all option -where I cleared the bit map from connection error- and another warning in skip arg if it had the connection error.
Also, I made 2 tests for each senario please check them
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review. Please make sure about the following:
- you have buildbot tests running fine: right now you have compilation errors
- you have some extra commits showing up. Please re-clone your branch from the latest main branch, then cherry pick the diff you need (preferably in a single commit) and override push this into your tree.
- get a commit message that conforms to the coding standards please.
I will leave the technical review to Andrei. He has already started.
|
I did not mention last time, but the PR contains unrelated commits. Those are better be stripped off sooner. |
|
The preferred way to update the base branch around here is with rebase. See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch. This guarantees that only your changes will be visible in the PR. I would suggest the following to make a clean single-commit PR:
This should ensure that your branch is based on the latest main from the upstream repo and that it has a single commit delta that contains only your changes. |
gkodinov
left a comment
There was a problem hiding this comment.
testing my script. please ignore.
af749ce to
1f1a08e
Compare
|
I cleared my branch, and it has only my commit now, but I couldn't clear the other 9 commits. I need to do |
gkodinov
left a comment
There was a problem hiding this comment.
According to https://github.com/MariaDB/server/blob/main/CODING_STANDARDS.md "Bug fixes should be based against the earliest maintained branch in which the bug can be reproduced". This is a bug fix. I'd say go for 10.6 if you can reproduce it there.
On the PR content: I'm not the best one to ask, but I believe gh will just display whatever extra commits you have in your branch. This is why I believe dropping the branch and re-creating it with a single commit containing only your change will do the trick.
The important outcome should be that your branch (as pushed in your cloned tree) should not contain more than 1 commit compared to its base branch.
andrelkin
left a comment
There was a problem hiding this comment.
Hi Nada!
The patch is close to be perfect. I have few suggestions for the test part.
Cheers,
Andrei
There was a problem hiding this comment.
--slave-skip-errors=all is actually tested with suite/rpl/t/rpl_temporary_error2_skip_all.test.
I suggest to incorporate the steps 1,2 into it.
As above the 3rd steps is apparently covered by the existing test.
It makes sense to wrap 1+2 into something like suite/rpl/include/check_slave_skip_errors.inc to invoke this macro by the two main tests.
There was a problem hiding this comment.
The test is good, but it feels it's too much to have two files dedicate to a rather simple issue.
Especially it possible to combine yours in some existing one.
I think suite/rpl/t/rpl_skip_error.test is a good candidate. Just let us append our error to its current
--slave-skip-errors=1062.
The steps 1,2 would have to be relocated into rpl_skip_error.test, to its beginning I think it'd a good location. Step 3 would be clearly covered by the existing test.
There was a problem hiding this comment.
Thank you for your helpful remark. I applied, please check it.
83d168b to
f308a94
Compare
f308a94 to
ad82034
Compare
No description provided.