Skip to content

Gracefully stopping pilots #8346

@chrisburr

Description

@chrisburr

Currently on the LHCb HLT farm the CE is configured to send SIGUSR1 to the Gaudi processes directly.

This has a couple of issues:

  1. If there is more than one job per pilot, the first job will exit gracefully but the pilot isn't aware of the graceful shutdown and will start another pilot.
  2. Sometimes the job hasn't got far enough to produce any output despite the situation being "okay"

I think the solution to both items is to have the CE communciate with DIRAC instead of Gaudi.

For 2 it would be useful to make it easier to filter out such cases, e.g. set the status to "Killed" and set a clearer application status.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions