You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a slurm electron within a base (Dask) sublattice and dispatching the sublattice within a base (Dask) lattice, the dispatch will run on the remote cluster, finish the job, then fail when retieving the job. The traceback reported in the GUI is:
Traceback (most recent call last):
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/site-packages/covalent_dispatcher/_core/runner.py", line 251, in _run_task
output, stdout, stderr, status = await executor._execute(
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/site-packages/covalent/executor/base.py", line 628, in _execute
return await self.execute(
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/site-packages/covalent/executor/base.py", line 657, in execute
result = await self.run(function, args, kwargs, task_metadata)
File "/Users/jbaker/code/covalent/covalent-slurm-plugin/covalent_slurm_plugin/slurm.py", line 695, in run
result, stdout, stderr, exception = await self._query_result(
File "/Users/jbaker/code/covalent/covalent-slurm-plugin/covalent_slurm_plugin/slurm.py", line 577, in _query_result
async with aiofiles.open(stderr_file, "r") as f:
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/site-packages/aiofiles/base.py", line 78, in __aenter__
self._obj = await self._coro
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/site-packages/aiofiles/threadpool/__init__.py", line 80, in _open
f = yield from loop.run_in_executor(executor, cb)
File "/Users/jbaker/miniconda3/envs/covalent_slurm/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/jbaker/.local/share/covalent/data/ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091/stdout-ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091-0.log'
How can we reproduce the issue?
I am using the sshproxy extra req and have prepared my covalent config file as suggested in the root README.md.
The code should run to completion, throwing now error in the GUI and print an integer.
Any suggestions?
It seems to me that the interaction between the Dask and Slurm executors is not quite right. Either way, the file Covalent is looking for exists on the remote directory in <wdir>/stdout-ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091-0.log but does not exist in the local directory /Users/jbaker/.local/share/covalent/data/ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091/stdout-ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091-0.log. Indeed, in /Users/jbaker/.local/share/covalent/data/ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091/ stdout files are contained within the /node/ subdirs.
The text was updated successfully, but these errors were encountered:
Environment
What is happening?
When running a slurm electron within a base (Dask) sublattice and dispatching the sublattice within a base (Dask) lattice, the dispatch will run on the remote cluster, finish the job, then fail when retieving the job. The traceback reported in the GUI is:
How can we reproduce the issue?
I am using the
sshproxy
extra req and have prepared my covalent config file as suggested in the root README.md.Here's a simple workflow to reproduce the above:
What should happen?
The code should run to completion, throwing now error in the GUI and print an integer.
Any suggestions?
It seems to me that the interaction between the Dask and Slurm executors is not quite right. Either way, the file Covalent is looking for exists on the remote directory in
<wdir>/stdout-ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091-0.log
but does not exist in the local directory/Users/jbaker/.local/share/covalent/data/ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091/stdout-ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091-0.log
. Indeed, in /Users/jbaker/.local/share/covalent/data/ee3b1f1b-b21b-4bbd-bc95-6bbc012c3091/ stdout files are contained within the/node/
subdirs.The text was updated successfully, but these errors were encountered: