synaxg plugin dev #332

einsteinXue · 2025-03-17T06:30:49Z

This PR is only for code review. makefile not modified yet

openshift-ci · 2025-03-17T06:31:19Z

Hi @einsteinXue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

thom311 · 2025-03-17T08:45:34Z

The CR has two fields ManualReboot and ManualUpgradeSDK.

From the meaning and the usage of those fields, this sounds imperative (like: user sets the flag, and then an action should happen). Unlike describing the end state we want to reach.

For ManualUpgradeSDK, shouldn't we instead configure the (desired) FirmwareVersion (which might be set to "latest", to automatically upgrade to the latest). The operator would then compare the DPU's firmware version with the desired one, and trigger an upgrade as requested. This requires, that the operator can check the current firmware version (and do so in a relatively non-expensive way).

For ManualReboot, that approach wouldn't work. But in that case, maybe the user should create a DpuReboot custom resource. Which represents the "one-shot-job", maybe similar to Kubernetes' Job. The operator would watch the CR, trigger the reboot and update the status of the CR. The operator could even implement this by starting a Kubernetes Job (which already implements concepts like retry).

Maybe what I said is not that different from what is implemented (at least for ManualReboot). However, the names "Manual" make this appear as being an imperative command, when the API should aim to have a more declarative feel. For ManualReboot, this seems to be more about the naming.

wizhaoredhat · 2025-03-17T21:49:24Z

The CR has two fields ManualReboot and ManualUpgradeSDK.

From the meaning and the usage of those fields, this sounds imperative (like: user sets the flag, and then an action should happen). Unlike describing the end state we want to reach.

For ManualUpgradeSDK, shouldn't we instead configure the (desired) FirmwareVersion (which might be set to "latest", to automatically upgrade to the latest). The operator would then compare the DPU's firmware version with the desired one, and trigger an upgrade as requested. This requires, that the operator can check the current firmware version (and do so in a relatively non-expensive way).

For ManualReboot, that approach wouldn't work. But in that case, maybe the user should create a DpuReboot custom resource. Which represents the "one-shot-job", maybe similar to Kubernetes' Job. The operator would watch the CR, trigger the reboot and update the status of the CR. The operator could even implement this by starting a Kubernetes Job (which already implements concepts like retry).

Maybe what I said is not that different from what is implemented (at least for ManualReboot). However, the names "Manual" make this appear as being an imperative command, when the API should aim to have a more declarative feel. For ManualReboot, this seems to be more about the naming.

I agree. This is how the Bare Metal Operator does this:
https://book.metal3.io/bmo/reboot_annotation

Do you see something wrong with this approach?

wizhaoredhat · 2025-03-17T22:04:17Z

@einsteinXue Is this tested to be working in your setup?

einsteinXue · 2025-03-20T07:27:50Z

@einsteinXue Is this tested to be working in your setup?

not tested yet. What I would like to confirm is whether the current implementation is feasible.

einsteinXue · 2025-03-20T07:30:24Z

The CR has two fields ManualReboot and ManualUpgradeSDK.

From the meaning and the usage of those fields, this sounds imperative (like: user sets the flag, and then an action should happen). Unlike describing the end state we want to reach.

For ManualUpgradeSDK, shouldn't we instead configure the (desired) FirmwareVersion (which might be set to "latest", to automatically upgrade to the latest). The operator would then compare the DPU's firmware version with the desired one, and trigger an upgrade as requested. This requires, that the operator can check the current firmware version (and do so in a relatively non-expensive way).

For ManualReboot, that approach wouldn't work. But in that case, maybe the user should create a DpuReboot custom resource. Which represents the "one-shot-job", maybe similar to Kubernetes' Job. The operator would watch the CR, trigger the reboot and update the status of the CR. The operator could even implement this by starting a Kubernetes Job (which already implements concepts like retry).

Maybe what I said is not that different from what is implemented (at least for ManualReboot). However, the names "Manual" make this appear as being an imperative command, when the API should aim to have a more declarative feel. For ManualReboot, this seems to be more about the naming.

Thanks for the comments, I will consider about your comments

einsteinXue · 2025-03-20T07:31:26Z

https://book.metal3.io/bmo/reboot_annotation

Let me read this. Thank you!

bn222 · 2025-04-17T06:42:15Z

For ManualUpgradeSDK, shouldn't we instead configure the (desired) FirmwareVersion (which might be set to "latest", to automatically upgrade to the latest). The operator would then compare the DPU's firmware version with the desired one, and trigger an upgrade as requested. This requires, that the operator can check the current firmware version (and do so in a relatively non-expensive way).

I agree, if the users wants to bring the system into a state where the firmware is upgraded, then he should just specify the desired state, and the upgrade should then happen as part of a reconciliation.

ManualReboot

We considered using a job for this, although we ended up with something that looks like what is implemented in this PR. I see how this looks like a job, although I don't see how that would be easy to create. I prefer to have a ManualRebootRequested field that the user sets to true and then we reconcile it back to false by doing the reboot just like it's done here: 5a05067#diff-10032ffdd17d4bb235e7916d1a7ce1514ffea6d6a0bbc19b27e580cca4ee54f2R55

@thom311 ^

@einsteinXue : In general, I agree with what you're proposing here, although these fields will move to another CR which are added by another PR. That PR is current blocked by other PRs so we will have to wait, although some of the changes in the current PR are sound and it's going to be a matter of rebasing and moving the fields into the CR that will be introduced.

einsteinXue · 2025-04-18T02:46:47Z

I agree, if the users wants to bring the system into a state where the firmware is upgraded, then he should just specify the desired state, and the upgrade should then happen as part of a reconciliation.

This is under developing. Thanks for @wizhaoredhat 's suggestion, I have already successfully pulled the desired version of firmware to be upgraded from quay.io.
But how to get the current firmware version still remains to be considered.

In general, I agree with what you're proposing here, although these fields will move to another CR which are added by another PR. That PR is current blocked by other PRs so we will have to wait, although some of the changes in the current PR are sound and it's going to be a matter of rebasing and moving the fields into the CR that will be introduced.

OK, please notify me if the dependent PR is merged. I will do the rebase things.

@thom311 or @bn222 Could you please share some info about how to create VFs within dpu-operator? As you can see in this PR, we are using gRPC to transmit SDK package to dpu, and gRPC relies on VFs.

bn222 · 2025-04-18T07:01:25Z

Currently, it is hardcoded to a predefined number of VFs. Once we have the DpuConfig CR added, the users will be able to change the number.

synaxgcom · 2025-07-10T03:01:37Z

@einsteinXue : In general, I agree with what you're proposing here, although these fields will move to another CR which are added by another PR. That PR is current blocked by other PRs so we will have to wait, although some of the changes in the current PR are sound and it's going to be a matter of rebasing and moving the fields into the CR that will be introduced.

@bn222 Hi Balazs, How about this dependent PR? Was it already merged? Can I start to rebase my SynaXG plugin related code?

bn222 · 2025-07-10T10:37:30Z

Hi @synaxgcom, we are very close to getting it merged. All the preparatory work has been finished. We are going to merge continue work on DPU CRs next sprint and merge it.

I recommend you start with rebasing today, because the final piece will not add muchuch from what's here today.

bn222 · 2025-09-25T15:16:46Z

Rebase on top of #574

Add a reboot requested field in that struct and reconcile it

einsteinXue · 2025-09-26T00:47:12Z

Rebase on top of #574

Add a reboot requested field in that struct and reconcile it

OK, will do. Thanks!

openshift-ci · 2025-12-02T06:35:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: einsteinXue
Once this PR has been reviewed and has the lgtm label, please assign wizhaoredhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2025

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 17, 2025

einsteinXue force-pushed the synaxg_plugin_dev branch 5 times, most recently from e038c1c to f962c6d Compare March 20, 2025 07:22

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2025

einsteinXue force-pushed the synaxg_plugin_dev branch from f962c6d to eb65b23 Compare July 28, 2025 03:23

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2025

synaxg plugin add

a1e49e3

einsteinXue force-pushed the synaxg_plugin_dev branch from eb65b23 to a1e49e3 Compare December 2, 2025 06:35

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025

synaxg plugin dev #332

Are you sure you want to change the base?

synaxg plugin dev #332

Uh oh!

Conversation

einsteinXue commented Mar 17, 2025

Uh oh!

openshift-ci bot commented Mar 17, 2025

Uh oh!

thom311 commented Mar 17, 2025

Uh oh!

wizhaoredhat commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wizhaoredhat commented Mar 17, 2025

Uh oh!

einsteinXue commented Mar 20, 2025

Uh oh!

einsteinXue commented Mar 20, 2025

Uh oh!

einsteinXue commented Mar 20, 2025

Uh oh!

bn222 commented Apr 17, 2025

Uh oh!

einsteinXue commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bn222 commented Apr 18, 2025

Uh oh!

synaxgcom commented Jul 10, 2025

Uh oh!

bn222 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bn222 commented Sep 25, 2025

Uh oh!

einsteinXue commented Sep 26, 2025

Uh oh!

openshift-ci bot commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wizhaoredhat commented Mar 17, 2025 •

edited

Loading

einsteinXue commented Apr 18, 2025 •

edited

Loading

bn222 commented Jul 10, 2025 •

edited

Loading