-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job list is empty on Slurm 18.08 #17
Comments
It just parses the output of squeue. So if squeue works, so should turm. Not sure what's going on. |
Can you give me a hint on how to best debug this from turm? I triple checked that squeue without args gives me a job list, but turm does not. Since I only have a CLI available I tried rust-gdb, but it's output interferes with the turm TUI. |
I am not intending to sound cheeky, but if there were automated tests shipped with turm, I could try running these on the SLURM system... |
would love to have tests, but then you have to somehow setup a clean slurm environemnt and start dummy jobs there. not sure how to best do that. this is the part you need to debug: Lines 53 to 133 in f104c7c
|
True, maybe this could provide a clean environment for testing? https://hub.docker.com/r/hpcnow/slurm_simulator Failing that I could also see a set of test job definitions maintained here to be run against an existing production Slurm installation that could be used for very basic testing, e.g. a few sleep jobs that print to stdout so that at least parts of the UI are tested. Regarding the part to debug: I do not yet have a CLI debugging setup for Rust. Another idea that came to mind: there is a feature of other Slurm TUIs to use SSH to connect to a Slurm host so the TUI would run locally and could then be more easily debugged, e.g. visual debugger in VS Code. Did you think about remote Slurm access? Do you have experience with SSH in Rust? |
You can use the remote SSH VS Code extension for running and debugging on the slurm host. |
Thanks for mentioning, a good idea! I tried debugging in VS Code which tells me to install LLDB extensions. After that LLDB fails with version `GLIBC_2.18' not found. Slurm is running on CentOS 7 which only has glibc 2.17. I think also other Rust dev tools need at least glibc 2.18? See also rust-lang/rust-analyzer#4706. |
In the meantime, I would try "printf-debugging", but written to a file because stdout will be drawn with TUI main loop already. I have this template: let path = "results.txt";
let mut output = File::create(path)?;
let job_command = ...
write!(output, "{}", job_command) Could you provide guidance on what to insert at |
Just debug print the let cmd = Command::new("squeue")
.args(&self.squeue_args)
.arg("--array")
.arg("--noheader")
.arg("--Format")
.arg(&output_format)
println!("{:?}", cmd); |
For me it only works with let cmd = Command::new("squeue")
.args(&self.squeue_args)
.arg("--array")
.arg("--noheader")
.arg("--Format")
.arg(&output_format)
.output();
println!("{:?}", cmd); which prints a string with the expected comma-separated fields. |
Ok, I could further narrow it down to this check: Line 67 in f104c7c
which always evaluates to true so it always returns None and never the Job .
|
And the actual cause I think is that: Line 65 in f104c7c
does not split at ###turm### because it is not included in the output of squeue .
It seems that the expectation with respect to Slurm output is not met, i.e.:
prints only the @@ -1 +1 @@
-The format of each field is "type[:[.][size][suffix]]"
\ No newline at end of file
+The format of each field is "type[:[.][size]]"
\ No newline at end of file |
So as mentioned in OP, it actually is a compatibility issue with Slurm 18.08. Do you see another way to do the string post-processing? E.g. split on a tab or a certain amount of blanks instead of the Edit: I see now that the only way to parse the output is to not use the |
Thanks for tracking this down!
If someone implements this in a robust enough way, I would be willing to merge it. I won't have time to do this myself. |
I went ahead and implemented my suggested approach from #17 (comment) in #20 |
As a user, running
turm
shows the TUI with the 3 main panes, but without any jobs. No keyboard press has a visible effect, only q for quitting.I compiled turm myself and we use Slurm 18.08. Is it maybe a compatibility issue?
The text was updated successfully, but these errors were encountered: