Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH connection problems with latest asyncssh version (2.15.0) #97

Open
mlpgwdg opened this issue Jul 5, 2024 · 0 comments
Open

SSH connection problems with latest asyncssh version (2.15.0) #97

mlpgwdg opened this issue Jul 5, 2024 · 0 comments

Comments

@mlpgwdg
Copy link

mlpgwdg commented Jul 5, 2024

Environment

  • Covalent version: 0.232.0.post1
  • Covalent-Slurm plugin version: 0.18.0
  • Python version: 3.8
  • Operating system: Ubuntu 22.04.4 LTS

Installed with conda

What is happening?

Recently (2 days ago as of time of writing this), asyncssh updated to v. 2.15.0. Something in this update seems to have broken the Covalent SLURM plugin. In particular, attempts at submitting jobs error out at around line 524 of slurm.py, right after using scp to copy the pickle files over to the remote server. Pickle files can be found on the remote server, but no other files after this point manage to be copied, nor are any SLURM jobs started. Errors are rather cryptic and seem to change, from "SSH connection closed" to NoneType errors from a failed asyncssh conn object. Error stack trace confirms the location of the error and the source being the asyncssh library. After setting log level to debug in covalent's config option, and checking error.log for the failed SLURM executor node, this error trace appears:

Traceback (most recent call last):
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_dispatcher/_core/runner.py", line 182, in _run_task
    output, stdout, stderr, status = await executor._execute(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 695, in _execute
    return await self.execute(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent/executor/base.py", line 724, in execute
    result = await self.run(function, args, kwargs, task_metadata)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 592, in run
    remote_paths = await self._copy_files(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/covalent_slurm_plugin/slurm.py", line 537, in _copy_files
    await asyncssh.scp(temp_g.name, (conn, remote_py_script_filename))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 1041, in scp
    reader, writer = await _start_remote(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/scp.py", line 190, in _start_remote
    writer, reader, _ = await conn.open_session(command, encoding=None)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4198, in open_session
    chan, session = await self.create_session(
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 4173, in create_session
    session = await chan.create(session_factory, command, subsystem,
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1207, in create
    result = await self._make_request(b'exec', String(command))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 740, in _make_request
    return await waiter
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1329, in data_received
    while self._inpbuf and self._recv_handler():
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/connection.py", line 1594, in _recv_packet
    processed = handler.process_packet(pkttype, seq, packet)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/packet.py", line 237, in process_packet
    self._packet_handlers[pkttype](self, pkttype, pktid, packet)
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 656, in _process_request
    self._service_next_request()
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 416, in _service_next_request
    result = cast(Optional[bool], handler(packet))
  File "/home/myusername/miniconda3/envs/covalent_env/lib/python3.8/site-packages/asyncssh/channel.py", line 1246, in _process_exit_status_request
    self._session.exit_status_received(status)
AttributeError: 'NoneType' object has no attribute 'exit_status_received'

For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)

For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.

How can we reproduce the issue?

  • Install latest versions of covalent and the covalent-slurm-plugin
  • Check that asyncssh is version 2.15.0
  • Attempt to run any simple, minimal covalent job through the SLURM plug in

What should happen?

Job should run correctly. Instead, it will error out with an SSH connection closed or mentions of "NoneType has no attribute 'exit_status_received'"

Any suggestions?

For a temporary fix: Revert to asyncssh v. 2.14.0 (restarting the covalent server and such, as needed)

For a more permanent fix: Some updates are needed in the plug in's code to be compatible with the latest version of asyncssh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant