-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GUI button to update users' Della htr2hpc installs #44
Comments
@cmroughan I took a stab at this, changes are in #47 I did some testing locally and think it's generally working (although currently installs develop version of htr2hpc). I thought we could try this on the test site once you've finished testing the other changes. |
Initial testing feedback from @mnaydan - tested but couldn't tell anything was happening when pushed the button. After second or third try was able to run it and get the success message. (Probably once the setup script runs more quickly) I've updated the task, it should now send an info notification when it starts with text indicating that the setup will be slow on the first run. I'm not seeing that notification reliably, sometimes it doesn't show up at all and at least once it showed up after the success message (although maybe that was for separate runs of the task?). |
Copied from comment on PR 47 :
|
@cmroughan I didn't add logic to add messages to a task report because I thought I had to set it up, but when I was demoing the new features I saw that there were task reports showing up in this list. Should I try adding the script output to the task report so it's easier to troubleshoot? I'd forgotten about this coremltools problem, is that still unresolved? |
To my knowledge the absence of If it's easy to quickly add the script output to the task report that could be helpful! Now that I'm thinking about it, there definitely might be troubleshooting that needs to happen with the setup script as new users sign on for the beta test. |
@cmroughan is it possible the coremltools issue is due to anaconda3/2024.2 vs anaconda3/2024.6 ? I was using different versions inconsistently in the code and made them all 2024.6 but I just tested the setup script and I get an error with 2024.6 but it seems to work with 2024.2. I'm not sure how to duplicate your error, but coremltools is showing as installed and I can run ketos and kraken scripts with no args without errors (I don't know if that's a sufficient test) When I ran the script with anaconda3/2024.6, this is the error I saw: ERROR: Could not find a version that satisfies the requirement torch==2.1 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0)
ERROR: No matching distribution found for torch==2.1 |
Oh perhaps that was it! I think to properly test it, either I need to wipe everything to do a fresh install on my account or grab someone with a Della account to run the htr2hpc install and then try to run a train job. (The issue had appeared in Wouter's train job -- the training ran successfully, but presented those warnings in the task message (and might have hit an error if the script had needed to determine a best model rather than using the one kraken had found). I could ask Wouter to click the button to reinstall htr2hpc and run another train job, see if the same error appears.) |
@cmroughan this feature is ready for testing again. I figured out where/how the escriptorium code creates the task report for me (there are some celery signal handlers that run before and after the task) and was able to hook into that to find the task report and add messages with the script command and script output. The test site is updated with these changes I've also switched the setup script and slurm code to use the anacaonda3/2024.2 module, which I think resolves the setup problem. If you want to test/experiment manually, you can copy the new version of the user setup script to della and then change the |
@cmroughan if Wouter is up for removing his htr2hpc conda env and running the setup task again, that would be helpful! I had this handy from my own testing, documenting in case useful:
|
I deleted my env and ran a test to try the htr2hpc install with a clean slate. I get different errors than Mary's, but I do get Wouter's error when running a train task. The output of the setup task, which hits errors at the pip install step:
And then, when running a task, I get the same
|
Also, I don't know for sure, but I suspect that the initial "Running user setup script, on first run this may take a while..." is often failing to appear because clicking the setup button sends a POST that refreshes the page. Perhaps the message tries to appear but gets cleared out by the page refresh? |
Requesting a code addition that would create a button in eScr which a user could press to automatically update their local installation of htr2hpc on Della. This would streamline any updates that might need to be made during the beta test, for example updating the logic for dynamic slurm resource allotment, if necessary.
The text was updated successfully, but these errors were encountered: