You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently (2 days ago as of time of writing this), asyncssh updated to v. 2.15.0. Something in this update seems to have broken the Covalent SLURM plugin. In particular, attempts at submitting jobs error out at around line 524 of slurm.py, right after using scp to copy the pickle files over to the remote server. Pickle files can be found on the remote server, but no other files after this point manage to be copied, nor are any SLURM jobs started. Errors are rather cryptic and seem to change, from "SSH connection closed" to NoneType errors from a failed asyncssh conn object. Error stack trace confirms the location of the error and the source being the asyncssh library. After setting log level to debug in covalent's config option, and checking error.log for the failed SLURM executor node, this error trace appears:
Traceback (most recent call last):
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_dispatcher/_core/runner.py", line 182, in _run_task
output, stdout, stderr, status = await executor._execute(
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 695, in _execute
return await self.execute(
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 724, in execute
result = await self.run(function, args, kwargs, task_metadata)
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 592, in run
remote_paths = await self._copy_files(
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 537, in _copy_files
await asyncssh.scp(temp_g.name, (conn, remote_py_script_filename))
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 1041, in scp
reader, writer = await _start_remote(
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 190, in _start_remote
writer, reader, _ = await conn.open_session(command, encoding=None)
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4198, in open_session
chan, session = await self.create_session(
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4173, in create_session
session = await chan.create(session_factory, command, subsystem,
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1207, in create
result = await self._make_request(b'exec', String(command))
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 740, in _make_request
return await waiter
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1329, in data_received
while self._inpbuf and self._recv_handler():
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1594, in _recv_packet
processed = handler.process_packet(pkttype, seq, packet)
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/packet.py", line 237, in process_packet
self._packet_handlers[pkttype](self, pkttype, pktid, packet)
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 656, in _process_request
self._service_next_request()
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 416, in _service_next_request
result = cast(Optional[bool], handler(packet))
File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1246, in _process_exit_status_request
self._session.exit_status_received(status)
AttributeError: 'NoneType' object has no attribute 'exit_status_received'
For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)
For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.
How can we reproduce the issue?
Install latest versions of covalent and the covalent-slurm-plugin
Check that asyncssh is version 2.15.0
Attempt to run any simple, minimal covalent job through the SLURM plug in
What should happen?
Job should run correctly. Instead, it will error out with an SSH connection closed or mentions of "NoneType has no attribute 'exit_status_received'"
Any suggestions?
For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)
For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.
The text was updated successfully, but these errors were encountered:
Environment
Installed with conda
What is happening?
Recently (2 days ago as of time of writing this), asyncssh updated to v. 2.15.0. Something in this update seems to have broken the Covalent SLURM plugin. In particular, attempts at submitting jobs error out at around line 524 of slurm.py, right after using scp to copy the pickle files over to the remote server. Pickle files can be found on the remote server, but no other files after this point manage to be copied, nor are any SLURM jobs started. Errors are rather cryptic and seem to change, from "SSH connection closed" to NoneType errors from a failed asyncssh conn object. Error stack trace confirms the location of the error and the source being the asyncssh library. After setting log level to debug in covalent's config option, and checking error.log for the failed SLURM executor node, this error trace appears:
For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)
For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.
How can we reproduce the issue?
What should happen?
Job should run correctly. Instead, it will error out with an SSH connection closed or mentions of "NoneType has no attribute 'exit_status_received'"
Any suggestions?
For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)
For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.
The text was updated successfully, but these errors were encountered: