Skip to content
/ server Public

Comments

MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error#4658

Open
nadaelsayed11 wants to merge 1 commit intoMariaDB:10.6from
nadaelsayed11:fix/MDEV-38624-never-skip-connection-killed
Open

MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error#4658
nadaelsayed11 wants to merge 1 commit intoMariaDB:10.6from
nadaelsayed11:fix/MDEV-38624-never-skip-connection-killed

Conversation

@nadaelsayed11
Copy link
Contributor

No description provided.

@CLAassistant
Copy link

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 16, 2026
@gkodinov gkodinov changed the title Fix/MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error Feb 16, 2026
Copy link
Contributor

@andrelkin andrelkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Thanks for the patch, please check my two notes.
More to that, find out what possible tests cover these changes, and if none, please add them too.

Cheers,

Andrei

inline int ignored_error_code(int err_code)
{
// Force slave to stop on this error even if in --slave-skip-errors
if (err_code == ER_CONNECTION_KILLED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is a possible place to mask ER_CONNECTION_KILLED error, it is less specific than
concurrency_error_code() to which ER_CONNECTION_KILLED belong just naturally.
Please consider that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At any rate the error-to-skip init_slave_skip_errors() function needs a warning about the supplied ER_CONNECTION_KILLED will be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked concurrency_error_code(), and I think it is better than ignored_error_code(), but I need to ask about the function name, the ER_CONNECTION_KILLED, is not a concurrency error, right?

Is it good to add a connection error to concurrency errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I will check the tests and add a warning message.

Thanks a lot :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ER_CONNECTION_KILLED is a sort of concurrency error in the slave parallel setup. Yet its semantic is not about error, at facing it the parallel slave won't stop, rather it will retry.

However, let me adjust my yesterday point and by that to simplify this work.
Along with the init_slave_skip_errors() warning let's simply skip setting its bit in slave_error_mask.
Notice the single-threaded "legacy" slave is fine with this. There is nothing wrong with a requirement your patch would impose: if its lonely thread is connection-killed, it then must exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the fix in init_slave_skip_errors() and added a warning in all option -where I cleared the bit map from connection error- and another warning in skip arg if it had the connection error.

Also, I made 2 tests for each senario please check them

@gkodinov gkodinov self-assigned this Feb 16, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review. Please make sure about the following:

  • you have buildbot tests running fine: right now you have compilation errors
  • you have some extra commits showing up. Please re-clone your branch from the latest main branch, then cherry pick the diff you need (preferably in a single commit) and override push this into your tree.
  • get a commit message that conforms to the coding standards please.

I will leave the technical review to Andrei. He has already started.

@andrelkin
Copy link
Contributor

I did not mention last time, but the PR contains unrelated commits. Those are better be stripped off sooner.

@gkodinov
Copy link
Member

The preferred way to update the base branch around here is with rebase. See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch. This guarantees that only your changes will be visible in the PR.

I would suggest the following to make a clean single-commit PR:

  • export only your changes from your existing branch. I'd do this by git diff origin/main...<your branch> > diff.diff.
  • inspect diff.diff and ensure it contains only your changes
  • drop your local branch: git branch -d <your local branch>
  • re-create your local branch based on the latest main: git checkout upstream/main; git branch <your branch>; git checkout <your branch>
  • apply your changes and resolve any conflicts : git apply diff.diff
  • commit: git commit ....
  • push your new branch to your forked tree: git push --force origin/<your branch> <your branch>

This should ensure that your branch is based on the latest main from the upstream repo and that it has a single commit delta that contains only your changes.

Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing my script. please ignore.

@nadaelsayed11 nadaelsayed11 force-pushed the fix/MDEV-38624-never-skip-connection-killed branch from af749ce to 1f1a08e Compare February 18, 2026 00:32
@nadaelsayed11 nadaelsayed11 changed the base branch from main to 10.11 February 18, 2026 00:50
@nadaelsayed11 nadaelsayed11 changed the base branch from 10.11 to main February 18, 2026 00:51
@nadaelsayed11 nadaelsayed11 changed the base branch from main to 10.11 February 18, 2026 00:53
@nadaelsayed11 nadaelsayed11 changed the base branch from 10.11 to main February 18, 2026 00:54
@nadaelsayed11
Copy link
Contributor Author

I cleared my branch, and it has only my commit now, but I couldn't clear the other 9 commits. I need to do cherry pick now, but I need to know if I should make my base main or 10.11?

Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/MariaDB/server/blob/main/CODING_STANDARDS.md "Bug fixes should be based against the earliest maintained branch in which the bug can be reproduced". This is a bug fix. I'd say go for 10.6 if you can reproduce it there.

On the PR content: I'm not the best one to ask, but I believe gh will just display whatever extra commits you have in your branch. This is why I believe dropping the branch and re-creating it with a single commit containing only your change will do the trick.

The important outcome should be that your branch (as pushed in your cloned tree) should not contain more than 1 commit compared to its base branch.

Copy link
Contributor

@andrelkin andrelkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Nada!

The patch is close to be perfect. I have few suggestions for the test part.

Cheers,

Andrei

Copy link
Contributor

@andrelkin andrelkin Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--slave-skip-errors=all is actually tested with suite/rpl/t/rpl_temporary_error2_skip_all.test.

I suggest to incorporate the steps 1,2 into it.
As above the 3rd steps is apparently covered by the existing test.
It makes sense to wrap 1+2 into something like suite/rpl/include/check_slave_skip_errors.inc to invoke this macro by the two main tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is good, but it feels it's too much to have two files dedicate to a rather simple issue.
Especially it possible to combine yours in some existing one.
I think suite/rpl/t/rpl_skip_error.test is a good candidate. Just let us append our error to its current
--slave-skip-errors=1062.

The steps 1,2 would have to be relocated into rpl_skip_error.test, to its beginning I think it'd a good location. Step 3 would be clearly covered by the existing test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your helpful remark. I applied, please check it.

@nadaelsayed11 nadaelsayed11 changed the base branch from main to 10.6 February 20, 2026 22:20
@nadaelsayed11 nadaelsayed11 force-pushed the fix/MDEV-38624-never-skip-connection-killed branch 2 times, most recently from 83d168b to f308a94 Compare February 21, 2026 00:33
@nadaelsayed11 nadaelsayed11 force-pushed the fix/MDEV-38624-never-skip-connection-killed branch from f308a94 to ad82034 Compare February 21, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants