3.8.1.66
- Added default path for ifconfig command (used to lookup IPv6 info) if command not found
- Support for OIDC tokens in urllib based request function (used for pilot-PanDA server communications)
- Together with a token key, the primary OIDC token is used to download a shorter token, used in the later communications with the PanDA server
- The pilot is refreshing the token immediately after launch, the original long lasting token is overwritten
- The short lasting tokens are refreshed periodically (once every 60 minutes)
- Note: OIDC tokens are used by default if found locally, otherwise X509 is used - i.e. there is no corresponding pilot option to activate the mechanism
- Received SIGTERM signals on Kubernetes resources reported with new error code 1379, “Job was preempted”
- Requested by R. Walker
- Discussed in JIRA ticket ATLASPANDA-1065
- Added two error codes for arcproxy failures
- 1380: “General arcproxy failure” (was previously reported as 1008: “"General pilot error, consult batch log"”)
- 1381: “Arcproxy failure while loading shared libraries”
- Note: this (1381) is currently only used internally and does not lead to a failed job
- Remote file open container now using EL9 instead of CentOS7
- Required for latest ROOT release
- Requested by A. De Silva
- Skipping setting RUCIO_ACCOUNT for payload
- Requested by R. Walker
- A time-out was added to the gdb command execution (for producing a core dump file) when a looping job has been discovered
- Requested by R. Walker
- Real-time logging
- Now possible to specify real-time logging server (type, protocol, URL and port) via pilot argument
- Previously, it only worked via pilot config
- Requested by W. Guan
- Added Loki real-time logging module (Rubin)
- Real-time logging can now be activated for all jobs on a given queue (relevant for pilot logs, not payload stdout)
- Activation currently via PQ.catchall
- Streaming of pilot logs requested by I. Vukotic
- To be tested more widely
- Now possible to specify real-time logging server (type, protocol, URL and port) via pilot argument
- New pilot option --noworkerpilotstatusupdate can be used to switch off worker pilot status updates
- Needed at NERSC
- Requested by T. Maeno
- Added timeout to urlopen() used for pilot-PanDA server communication
- The default timeout is too short and for getjob operations can lead to “jobdispatcher, 102: Sent job didn't receive reply from pilot within 30 min”-errors
- In case of failure, pilot will currently fallback to curl based communication
- Timeout is now explicitly set to 30 s
- Reported by Z. Yang (Rubin)
- Bug fix
- Patch for setting final job completion state before log stage-out had completed
- Leading to “ddm, 200: Could not get GUID/LFN/MD5/FSIZE/SURL from pilot XML”-error
- Reported by R. Walker, discussed in JIRA ticket ATLASPANDA-1047
- Patch for setting final job completion state before log stage-out had completed
- Housekeeping with pylint
- The average pylint score of all pilot modules is 9.56
Contributions from W. Guan, P. Nilsson