Skip to content

Conversation

@auniyal61
Copy link
Contributor

@auniyal61 auniyal61 commented Nov 27, 2024

Right now when multiple cells gets deleted if any one the cell deletion fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Closes #OSPRH-10550

@auniyal61
Copy link
Contributor Author

The jira report says, we need to do this for cellCreation as well, but I think its already taken care of at here https://github.com/openstack-k8s-operators/nova-operator/blob/main/controllers/nova_controller.go#L328-L356

Copy link
Contributor

@mrkisaolamb mrkisaolamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some func tests for cell deletion to assert if deletion is working and returning errors for all cells. Also it will be nice to assert what are conditions of nova instance

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/55d93ec55a534c64a74ed0e2e7f7761a

✔️ openstack-meta-content-provider SUCCESS in 47m 42s
nova-operator-kuttl RETRY_LIMIT in 17m 30s
nova-operator-tempest-multinode FAILURE in 22m 11s
nova-operator-tempest-multinode-ceph FAILURE in 23m 05s

@auniyal61 auniyal61 force-pushed the cell-deletion branch 4 times, most recently from ea9aefb to fe73ce1 Compare December 19, 2024 09:56
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion logic looks good with a small request about ordering. I have couple of comments on the testing side.

Comment on lines 206 to 207
delete(nova.Spec.CellTemplates, "cell2")
delete(nova.Spec.CellTemplates, "cell3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that the deletion of cell2 is always before cell3 in novaCellList.Items at L605 in the controller? If cell3 can be before cell2 then the test below does not prove that the error in cell2 was just collected but not stopped the loop there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the way to make it testable is to sort the cells be cell name when iterated in the controller. We do that for cell creation anyhow in #903 to avoid the unstable ordering to cause unstable condition messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes novaCellList is already sorted and cell2 always comes before cell3 in reconciler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted

g.Expect(nova.Status.RegisteredCells).NotTo(HaveKey(cell3.CellCRName.Name))
}, timeout, interval).Should(Succeed())

// NovaCellNotExists(cell3.CellCRName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yepp you can check if k8s returns not found for cell3 to prove that it is deleted. You can also check that the RegisteredCells list does not contain cell3 any more but still contains cell2.

@gibizer
Copy link
Contributor

gibizer commented Dec 20, 2024

Running the test locally shows that cell2 is also successfully deleted so the current test does not recreate the necessary scenario to prove that failure in one cell does not prevent the deletion of the other.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5839fe00e7f1456ba553d6adfcd1e5ee

✔️ openstack-meta-content-provider SUCCESS in 2h 43m 17s
nova-operator-kuttl RETRY_LIMIT in 29m 32s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 06m 12s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 26m 19s

@auniyal61 auniyal61 force-pushed the cell-deletion branch 2 times, most recently from ddcb2bf to d046279 Compare January 23, 2025 05:54
@auniyal61 auniyal61 force-pushed the cell-deletion branch 3 times, most recently from 4a961b5 to 20729e2 Compare January 23, 2025 10:07
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/6472550c14a1482fae1096456327090c

✔️ openstack-meta-content-provider SUCCESS in 3h 02m 18s
nova-operator-kuttl FAILURE in 36m 33s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 02m 18s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 30m 11s

@gibizer
Copy link
Contributor

gibizer commented Jan 23, 2025

@auniyal61 so you pinged me on slack about the DeferCleanup issue.

so I am deleting a cell, but cell-resource (like secret ) is defined in setup (Before each or another funciton).
now once tests is finished deferecleanup runs and look for secret, but as we already deleted cell, its not present in DB, so defercleanup complains,
how to resolve this

The DeferCleanup doc says:
https://onsi.github.io/ginkgo/#cleaning-up-our-cleanup-code-defercleanup

DeferCleanup can be passed a function that takes no arguments and returns no value. You can also pass a function that returns values. DeferCleanup ignores all these return value except for the last. If the last return value is a non-nil error - a common go pattern - DeferCleanup will fail the spec.

So I guess the situation is that

  1. we create cell3 during the BeforeEach() section and put a DeferCleanup() call to the all the pieces we created
  2. then delete the cell during the normal It() section
  3. then the test runner executes the deferred cleanups where the client.Delete call returns a non nil error as the last return value because the resource is not found and that causes the test cleanup to fail as defined by the doc.

This means we have multiple options:

  • a) we wrap the client.Delete call and ignore the NotFound error. So cleanup still tries to delete the resources but if they do not exists then the cleanup still succeeds.
  • b) we don't create the cell3 in the shared CreateNovaWith3CellsAndEnsureReady() helper. But create it just for our test case (in our BeforeEach or in the It ) and we don't put DeferCleanup for the pieces of cell3 as we expect that the our test will delete them anyhow.

I would do option b) as it is more explicit about why we do thing differently.

(I need to get back to your other comments above a bit later)

@auniyal61
Copy link
Contributor Author

So I guess the situation is that

  1. we create cell3 during the BeforeEach() section and put a DeferCleanup() call to the all the pieces we created
  2. then delete the cell during the normal It() section
  3. then the test runner executes the deferred cleanups where the client.Delete call returns a non nil error as the last return value because the resource is not found and that causes the test cleanup to fail as defined by the doc.

Yes

This means we have multiple options:

  • a) we wrap the client.Delete call and ignore the NotFound error. So cleanup still tries to delete the resources but if they do not exists then the cleanup still succeeds.
  • b) we don't create the cell3 in the shared CreateNovaWith3CellsAndEnsureReady() helper. But create it just for our test case (in our BeforeEach or in the It ) and we don't put DeferCleanup for the pieces of cell3 as we expect that the our test will delete them anyhow.

I would do option b) as it is more explicit about why we do thing differently.

we have to create new-cell in CreateNovaWith3CellsAndEnsureReady, because that's where CR-spec is defined and saved, we can't pass and update spec later in our Describe("Nova multi cell deletion")

the oher way would be to redefine all cells again, which is just a duplicate code of CreateNovaWith3CellsAndEnsureReady.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b81fac0e1fef40fe95499e5f784d313c

✔️ openstack-meta-content-provider SUCCESS in 3h 22m 00s
nova-operator-kuttl FAILURE in 33m 33s
nova-operator-tempest-multinode FAILURE in 2h 40m 04s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 32m 21s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4b5237bf070745c18ac143e6d94bb68e

✔️ openstack-meta-content-provider SUCCESS in 2h 57m 35s
nova-operator-kuttl FAILURE in 35m 03s
nova-operator-tempest-multinode FAILURE in 2h 24m 44s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 34m 21s

@auniyal61
Copy link
Contributor Author

/retest

2 similar comments
@auniyal61
Copy link
Contributor Author

/retest

@auniyal61
Copy link
Contributor Author

/retest

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/40aec968396741779ef7ecc17669ea60

openstack-meta-content-provider FAILURE in 10m 05s
⚠️ nova-operator-kuttl SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-meta-content-provider
⚠️ nova-operator-tempest-multinode-ceph SKIPPED Skipped due to failed job openstack-meta-content-provider

Right now when multiple cells gets deleted if any one the cell deletion
fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Adds a functional test

Closes #OSPRH-10550
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have couple of follow up requests but the current code and test looks good to me so lets land this and do the refactors in a separate PR

"cell1": cell1Template,
"cell2": cell2Template,
"cell3": cell3Template,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup try to de-duplicate logic in CreateNovaWith4CellsAndEnsureReady and CreateNovaWith3CellsAndEnsureReady via something like the following:

spec := GetDefaultNovaSpec()
spec["cellTemplates"] = map[string]interface{}{}
spec["cellTemplates"]["cell0"] = CreateCellTemplate("cell0")
spec["cellTemplates"]["cell1"] = CreateCellTemplate("cell1")
spec["cellTemplates"]["cell2"] = CreateCellTemplateWithoutCleanup("cell2")
spec["cellTemplates"]["cell3"] = CreateCellTemplateWithoutCleanup("cell3")

I guess the logic below the spec definition is hard to deduplicate that way because of ordering requirement of the simulation functions.

@mrkisaolamb mrkisaolamb self-requested a review March 13, 2025 09:38
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: auniyal61, gibizer, mrkisaolamb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [auniyal61,gibizer,mrkisaolamb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 2e6d3cd into openstack-k8s-operators:main Mar 13, 2025
7 checks passed
@auniyal61 auniyal61 deleted the cell-deletion branch June 10, 2025 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants