Skip to content

D2D trigger capability errors#20973

Draft
ettec wants to merge 8 commits intodevelopfrom
don2don-trigger-capability-errors
Draft

D2D trigger capability errors#20973
ettec wants to merge 8 commits intodevelopfrom
don2don-trigger-capability-errors

Conversation

@ettec
Copy link
Collaborator

@ettec ettec commented Jan 30, 2026

Change overview:

This change is to support the reporting of capability errors when a remote trigger is registered across the DON2DON (D2D) layer, no other changes required to handle the errors produced by trigger registration are included in this PR.

Brief overview of the current situation (before this change) for context:

On RegisterTrigger the TriggerSubscriber (TS) adds the registration to a list, this list is then polled at regular intervals and for each registration in the list a TriggerRegistration request is sent to the TriggerPublisher (TP). The TS does not wait for any response from the TP, it assumes the registration was successful, thus there is currently no way to report registration errors to the workflow DON (where the TS sits).

On receipt of a TriggerRegistration request the TriggerPublisher will attempt to register to the underlying trigger (if it hasn't already), if an error occurs it will be logged but no indication of the error is sent back to the TS. Note, the TS must send regular TriggerRegistration requests to the TP to keep the registration alive, failure to do so will result in the TP considering the registration to be dead and unregistering from the underlying trigger.

Summary of the change in this PR

The change is subject to the following constraints:

  • Must be backwards compatible - that is a DON with this change must continue to work when communicating to a DON without this change.

  • The D2D trigger communication mechanism has been proven in production, we want to avoid anything that fundamentally changes the behaviour.

This change introduces the concept of a TriggerRegistrationStatus message (actually it sort of existed already but was unused) which is sent by the TP to the TS whenever the TP receives a TriggerRegistrationRequest from the TS. The TriggerRegistrationStatus message includes the latest state of the registration on the underlying trigger including any registration error if one occurred. As before, if the trigger is not currently successfully registered the receipt of the TriggerRegistrationRequest will cause the TP to attempt a new registration. Also as before, the continued receipt of TriggerRegistrationRequests by the TP is necessary to keep a registration live.

On initial registration the TS can now (optionally) wait for TriggerRegistrationStatus messages from the TP, these can be used to determine if the initial registration attempt was successful. Note, even though the TS will continue to send TriggerRegistrationRequests as part of polling, it does not (currently) have any code that deals with updates to the trigger registration status beyond the initial registration - this could be something to address in a future change, for now we are focussed on getting back the status on initial subscription so we can identify most errors, and in particular user errors.

Note, if the TS does not receive sufficient TriggerRegisrationStatus messages during the initial trigger registration phase it will return the events channel associated with the subscription along with an error indicating that registration status was unable to be determined. Essentially this allows the code to behave in the current way (pre-this change) if running against old DON or for whatever other reason the status messages are not received.

It is expected that in most, if not all, cases initial trigger registration status will be returned, so will be a significant improvement over the current behaviour. At some future point, perhaps after all DONs are migrated, it would be possible to make the wait for initial trigger registration status mandatory, tbd.

@github-actions
Copy link
Contributor

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@trunk-io
Copy link

trunk-io bot commented Jan 30, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@ettec ettec force-pushed the don2don-trigger-capability-errors branch 2 times, most recently from 255e693 to da0c8b6 Compare February 4, 2026 15:00
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

CORA - Pending Reviewers

Codeowners Entry Overall Num Files Owners
/core/capabilities/ 💬 20 @smartcontractkit/keystone, @smartcontractkit/capabilities-team
/core/services/workflows 💬 2 @smartcontractkit/keystone
go.md 💬 1 @smartcontractkit/core, @smartcontractkit/foundations
go.mod 💬 6 @smartcontractkit/core, @smartcontractkit/foundations
go.sum 💬 6 @smartcontractkit/core, @smartcontractkit/foundations
integration-tests/go.mod 💬 1 @smartcontractkit/core, @smartcontractkit/devex-tooling, @smartcontractkit/foundations
integration-tests/go.sum 💬 1 @smartcontractkit/core, @smartcontractkit/devex-tooling, @smartcontractkit/foundations

Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown

For more details, see the full review summary.

@ettec ettec force-pushed the don2don-trigger-capability-errors branch from 4b25293 to b08b44b Compare February 4, 2026 17:52
@ettec ettec force-pushed the don2don-trigger-capability-errors branch from b08b44b to cfefc4f Compare February 5, 2026 10:07
@cl-sonarqube-production
Copy link

@@ -444,9 +445,14 @@ func (e *Engine) runTriggerSubscriptionPhase(ctx context.Context) error {
// no Config needed - NoDAG uses Payload
})
if regErr != nil {
Copy link
Collaborator Author

@ettec ettec Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - now the remote trigger can return an error, this means workflow will fail to start when this happens. In the legacy code as the error from a remote trigger was not reported, it would continue to poll the publisher with registration requests, such that in theory an inactive trigger could become active eventually should the error be caused by a transient issue.

If we wanted to preserve that behaviour we could differentiate between error types that are consider to be recoverable/non-recoverable and only fail workflow in the non-recoverable case. (for example we could consider user errors non-recoverable, system errors recoverable or even introduce a seperate attribute for recoverability on the capability error). However, the workflow would need to know if the trigger capability is being run locally or remotely.

However, we already have workflow retry behaviour, so IMO it's acceptable to fail the workflow where remote trigger returns an error on initial registration and rely on workflow retry to handle transient errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant