Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt token for WMAgent stage-in/stage-out #12144

Open
amaltaro opened this issue Oct 14, 2024 · 3 comments
Open

Adopt token for WMAgent stage-in/stage-out #12144

amaltaro opened this issue Oct 14, 2024 · 3 comments
Assignees

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Oct 14, 2024

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
Similarly to this ticket #11199 , we need to adopt token for the WMAgent payload. In other words, instead of using X509-based stage-in and stage-out auth/authz, we should adopt a token solution for this storage communication.

Describe the solution you'd like
Support token in WMAgent for stage-in / stage-out.

Tokens in the grid jobs will only be available once we configure
a) access to token in the agent node;
b) management of the token in the agent node;
c) propagation of the token by htcondor and WMAgent job description;
d) use of token by the grid job (stage-in / stage-out).
Unless we have all this setup in place, we shouldn't have production jobs accessing tokens during the job runtime.

As a result, that requires at least the following developments:

  • setup of HTCondor to propagate the relevant token to the job condor shadow
  • update SimpleCondorPlugin to define token in the job classad
  • have the bearer token defined in the job environment (to be picked up by CMSSW for stage-in, and read it for stage-out)
  • then, improve the debugging information with the token-relevant information

Describe alternatives you've considered
If token-based auth/authz fails, do we want to fallback to x509 ?

Additional context
None

@anpicci
Copy link
Contributor

anpicci commented Oct 16, 2024

@amaltaro I took the liberty to update the description of the issue, according to the discussion in #12081.
@stlammel , feel free to provide additional comments and suggestions here, rather than in the PR linked above, at your convenience.

@stlammel
Copy link

I think we want to make the stage-out token safe now, i.e. in case a token is in the environment and transfer with token doesn't work, the stage-out doesn't fails. (Right now, if HTCondor makes a token available but token-based transfer doesn't work stage-out may fail.)
Something like:

print date/time
print hostname
print GFAL*, PYTHON*, and LD_* environment
print gfal-copy location
print PFNs

gfal-copy ...
if ( rc == 0 ) done

if ( X509_USER_PROXY is set )
sleep 3 sec
in subprocess {
unsetenv BEARER_TOKEN
unsetenv BEARER_TOKEN_FILE
gfal-copy -v ...
if ( rc == 0 ) done
}
if ( BEARER_TOKEN or BEARER_TOKEN_FILE is set )
sleep 3 sec
in subprocess {
unsetenv X509_USER_PROXY
gfal-copy -v ...
if ( rc == 0 ) done
}

sleep 15 min
print date/time
voms-proxy-info -all
httokendecode
gfal-copy -vvv ...

(token information may change underneath us, so we should print it just before the debug gfal-copy).
'just a thought/suggestion.
Thanks,

  • Stephan

@amaltaro
Copy link
Contributor Author

Right! We will add the necessary safety mechanism once we integrate tokens in the grid jobs. Thanks Stephan.

@anpicci anpicci self-assigned this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: WMAgent
Status: In Progress
Development

No branches or pull requests

3 participants