Skip to content

Conversation

@sfayer
Copy link
Member

@sfayer sfayer commented Nov 12, 2024

Hi,

As discussed at the last BiLD meeting, I've been looking at the problem of the bundled proxy expiring in the middle of the pilot job, causing various failures (as this is no longer renewed like the proxy delegated via the CE).

Here is an implementation of my proposed fix: Instead of using the proxy that's used to submit the job to the CE (which is capped by the VO's maximum VOMS lifetime), we generate a new vanilla proxy (with the lifetime set to a multiple of the expected maximum runtime) to send in the bundle. The pilot then uses this new proxy to report back to DIRAC which doesn't need VOMS. This is the same approach we were previously using in the CloudCE (where pilot runtime could easily be 2+ weeks for efficiency reasons).

I've done basic testing and it seems to fix the problems I was initially seeing, seemingly without creating any new issues.

Regards,
Simon

BEGINRELEASENOTES
*WorkloadManagement
CHANGE: Use plain proxy for the pilot bundle
ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Nov 12, 2024
@sfayer sfayer marked this pull request as draft November 12, 2024 21:45
@sfayer sfayer force-pushed the fix_pilotproxylen branch 3 times, most recently from a411e55 to 1651b27 Compare November 12, 2024 23:14
@sfayer sfayer marked this pull request as ready for review November 13, 2024 10:11
@sfayer
Copy link
Member Author

sfayer commented Nov 13, 2024

I think that's the sitedirector test fixed for this patch now... The one remaining failure appears to be a random one.

@aldbr
Copy link
Contributor

aldbr commented Nov 13, 2024

Yes I reran the test, it looks good!

@fstagni fstagni merged commit 37b625a into DIRACGrid:rel-v8r0 Nov 14, 2024
@DIRACGridBot DIRACGridBot added sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention labels Nov 14, 2024
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/11834680708

Failed:

  • integration
    cherry-pick 37b625a into integration failed
    check merge conflicts on a local copy of this repository
    git fetch upstream
    git checkout upstream/integration -b cherry-pick-2-37b625a44-integration
    git cherry-pick -x -m 1 37b625a44
    # Fix the conflicts
    git cherry-pick --continue
    git commit --amend -m 'sweep: #7886 Use plain proxy for pilot bundle' --author='Simon Fayer <sf105@ic.ac.uk>'
    git push -u origin cherry-pick-2-37b625a44-integration
    
    # If you have the GitHub CLI installed the PR can be made with
    gh pr create \
         --label 'sweep:from rel-v8r0' \
         --base integration \
         --repo DIRACGrid/DIRAC \
         --title '[sweep:integration] Use plain proxy for pilot bundle' \
         --body 'Sweep #7886 `Use plain proxy for pilot bundle` to `integration`.
    
    Adding original author @sfayer as watcher.
    
    BEGINRELEASENOTES
    *WorkloadManagement
    CHANGE: Use plain proxy for the pilot bundle
    ENDRELEASENOTES
    Closes #7889'

sfayer added a commit to sfayer/DIRAC that referenced this pull request Nov 14, 2024
sfayer added a commit to sfayer/DIRAC that referenced this pull request Nov 14, 2024
sfayer added a commit to sfayer/DIRAC that referenced this pull request Nov 14, 2024
sfayer added a commit to sfayer/DIRAC that referenced this pull request Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants