-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows ssh support by ipcluster #837
Comments
That would be great, thanks for offering! Without any ability to test, it's likely to end up unmaintained unless you or someone with access to Windows + SSH can volunteer to keep it maintained, because I can't do it. Do you think you can install Windows SSH on the GitHub actions runner? |
You are right, testing is a must! I have zero experience with GitHub action runners but if there is a corresponding ssh ipcluster test for Linux I'm 100% sure that the same thing is achievable for Windows. Maybe the existing test could be simply extended by a Windows instance... By the way, I'm also happy to maintain this feature since I will use it quite often in the future… |
We have Windows tests already (the So you can probably start with a PR that duplicates the ssh test run to also run on Windows, then you can try figuring out what needs to be set up. Currently, the SSH tests use docker-compose to launch the cluster, which I guess won't work for testing Windows. |
I will try my best, but as mentioned before I'm new to GitHub action runners. Let's see how far I will get... Meanwhile we should start the discussion on how to technically supporting different OS and shells (It might be that ipcluster currently doesn't fully work on exotic Linux shells. Furthermore, under Windows you can configure OpenSSH to use cmd.exe or powershell.exe which have totally different syntax). As mentioned in my first post, the differences can be addressed by a) the 'client' side via python scripts or b) on the 'server' side by determining the client OS/shell. ad a) ad b) Is it necessary to have a broader discussion on this topic? |
If we want to support Windows, the way to go is probably to implement as much as possible in Python (perhaps move all this to |
Sounds good to me and I will go in that direction. I have to admit that Windows tries its best to NOT let you run its OpenSSH server under Docker. Luckily I was able to get around all the issues... However, there is one thing in the Linux ssh cluster I do not understand: In the docker container the ssh key pair is generated. But I'm not able to pin point the location where the private key is copied from the docker container to the main runner VM (is then used by the ipcluster_config.py - https://github.com/ipython/ipyparallel/blob/main/ci/ssh/ipcluster_config.py)? Can you give me a hint... I have a solution for Windows, but I want to sick to the Linux solution as close as possible... |
The ssh key is extracted from the running container with |
Thanks very much for the hint! My point is that we require a slightly different ssh section in test.yml for windows and linux anyway. So we could probably remove all docker command from conftest.py. By the way, what's the point of ssh_dir? Returning the local ci/ssh directory but also checking if the docker container is running. Two totally different thing in one function? A bit confusing.... I suggest to split that up into two functions or maybe moving the actual ssh_dir code directly in ssh_key (I havn't found any other reference of ssh_dir) Since my ssh docker test under windows is now fully working, I will proceed with the actual implementation of ipyparallel.cluster.ssh... |
Feel free to reorganize it in a way that makes sense. The main goal is that the ssh tests should be skipped if the ssh context can't be set up. Originally, I wanted the fixture to call docker-compose build/up/etc., but that wasn't working for whatever reason. |
Today, I spend quite some time in understanding how the parameter setting and actual function calling work in the app classes like IPCluster, etc. |
I think a simple cli entrypoint with argparse makes the most sense. That ought to be quicker to load. If we find that we have some server-side configuration to do, we should switch it to a traitlets app to be consistent with config loading. |
Thanks for the quick reply. So I will implement an argparse solution... |
I'm getting closer to a fully working implementation. But before actually replacing shell code in ipyparallel/cluster/launcher.py I may ask for your feedback regarding the proposed implementation: I changed to name to ipyparallel/cluster/shellcmd because, it allows sending commands to any shell. Hence, it’s not limited to ssh which makes it very versatile and also easy test. I have already established a corresponding test that secures that code works with all possible shells. I haven’t tested the code on linux yet, but I do not expect any surprises since it works under wsl in windows. The actual implementation consists of two classes: During testing I discovered that calling the Final question: For establishing an SSH IPCluster the ssh infrastructure needs to be set up for all computers involved. Furthermore, python and ipyparallel package is required in advance. But python and ipyparallel doesn’t have to be a prerequisite, if IPCluster would support an (engine) initialisation step based on a shell scripts. IPCluster could even support different scripts for different systems ( |
I have been working work hard to get the shellcmd functionality working on all platforms and shells. Only relying on the Github runners was way to ineffective (much slower compared to local setup and hard/impossible to analyze errors/problems). I have now two full development environments: one on Windows and one on Linux (Ubuntu). Although the setup took some while, I have now full debugging functionality on both system, which was well worth. As mentioned before the functionality is split into two classes: Things are much simpler if we send the command via python. Similar as e.g. the existing ssh_waitpid function. In that sense we won’t need (or at least won’t use) a cli interface and we can avoid the quoting horror. This makes the code more compact, easier to understand and maintain. If you agree on that I would change the code accordingly. The pytest |
Thanks for working on this! I don't have time to get into too much detail, but I will say that piping code to Python is a good idea, rather than importing an entrypoint on the other side. That will be a lot more reliable, I think. It does put small limitations on the far side of what code is acceptable, but since it's meant to be very simple RPC shell commands, that shouldn't be a problem. So if you send a Python script to be executed via stdin which listens for commands on lines, you should be able to send simple commands on lines, e.g. serialized as JSON for reasonable behavior without platform-sensitive command-line parsing. BTW, feel free to do your work in a draft pull request. That will make conversations about specific code easier, and there's no need to wait for things to be 'complete' before you open one. |
Thanks very much for your quick reply. Ok, then I will definitely go for the piping method. I will clean up my code a bit and send a draft pull request in the next days... :-) |
I havn't recieved any feedback on my PR yet. Is this because of viewers lack of time or because th PR is not fully working under Github Runners yet? or some other reason? Please get back to me and drop me a short note... |
Sorry, it's just because I've been busy. I clicked the 'approve and run' button, so it should be running on CI now. I will try to get to it when I can. Thank you so much for doing the work! I'll post a review on the PR. |
Thanks and I didn't want to stress you. Since I have no experince with PR I wasn't sure if I did all correctly. By the way, the pre-commit.ci run shows some error. How can I fixe those (output somehow truncated)? Is there a way to run this locally? Thanks! |
Yes, you can install pre-commit: pip install pre-commit and run
in the repo (many repos use this tool). Then every time you commit, these checks will be run on the changes about to be committed. You can also run the hooks on already-committed and/or unstaged files with
|
Thanks very much for the pre-commit explanations! Finally, I almost got the ssh tests with the Windows github runner to work - see my last SSH-Test run. The last ‘big’ obstacle was a file name issue of the engine logs (forward/backward slashes), which is why I never got the engine logs retrieved. Since it worked in my local environment (no idea why the different slashes work on my machine but not in the github runner), I thought for a long time that the engines never got started. But this wasn't the problem... Now I get the engine logs and can see the final problem: The engines try to register against an incorrect controller IP. At the end of Set up docker-compose for ssh windows launcher step I output the network addresses (ipconfig):
But the engines try to register against 10.1.54.0:
How do the engines determine the controller ip? Can this somehow be controlled? Under linux things work correctly:
log:
|
There are two meanings of 'ip'. One is the 'bind' ip address (likely 0.0.0.0 for ssh), set with
or
If they should differ, bind ips can be set separately for engines and clients with The default for the location is to use |
Thanks again for the infos. By adding a corresponding entry to the container hosts file (of course the docker add host command does NOT support windows...), I was able to 'place' the correct ip address. Although I tried different IP addresses of the different host network adapters, I never got the connection to work. The IPEngine always reports a 'registration timed out'.... I guess the github runner windows firewall blocks the network traffic. This is all so annoying, since there is no issue on a local machine... |
YEEESSS. Turning off the windows firewall was the final missing link. As you can see here, the ssh tests did work (persistence paid off eventually) I will update my PR next week. So stay tuned! |
Finally, everything is working and code, github workflows, etc. is cleaned up and (hopefully) properly comment. So from my side the PR is ready "to go..." |
I do want to be annoying, but did you have time to look at the pull request? If you have any suggests for improvement/changes I'm happy to adapt the code. I'm also willing to change to documentation regarding the ssh windows support. But I would like to finish this work as soon as possible. Thanks |
thanks for your feeback on the PR. I was busy this week, but I can hopefully work on the code next week again |
My apologies for not working on the PR for such a long time. There were so many urgent issues in the last 6 month that I never had a few days in a row to finish up the PR. But now I’m back! After resyncing with the main project, I now have an issue when installing ipyparallel in the docker container for the ssh test. In my local container everything works perfectly fine, but NOT in the github windows runner. The main issue is that the editable pip installation of ipyparallel fails within the container. I have only basic knowledge about the pip installation process and JupyterLab, so I don’t understand the problem. Could you point me into the right direction? Please have a look at the end of step Set up docker-compose for ssh windows launcher (not the failed ssh test step ) of
Any idea? As mentioned before is must be related to the github windows runner (no issue in a local docker container…) |
Just a small update: Found the docker-compose issue. Somehow the environment variable IPP_DISABLE_JS=1 is not fully propergated into the docker container. Very strange since a simple |
Great! No rush, and yeah it is definitely fine to set IPP_DISABLE_JS in the containers for testing this, I'm not sure why it's not propagating, but setting it explicitly ought to work. |
Thanks for your reply! Setting the environment variable using powershell commands resolved the issue. See In my PR pre-commit.ci ruff complains on a lot of files, which I havn't touched. Is this a problem or can I simply ignore that? I have some more questions regarding your code suggestions: I have replied to about half of them already. Is it ok that way I did that or do you expect a separate commit for each comment? Should I resolve the conversation if I incorporate your suggestions or do you always want to do that? Thanks & cheers |
My guess is most of this will be resolved by merging from main.
Resolving yourself with a link to the commit is a fine way to do it, or just a description of what you changed. Feel free to leave them open with a comment if you feel like I should have a look to decide if it's been addressed or not. It definitely doesn't need to be a dedicated commit for each comment. Don't worry about the git history, I think we'll use squash & merge for this one. |
Thanks again for your comments. I have resolved all code comments except of one (allways (re-)opening the debug log file handle). Have a look on my comment and feel free to presist on your suggestion... Apart from that I think my PR is ready to "go" Just a final note: I have added the test_shellcmd.py also to the ssh github runner test. Otherwise the tests |
Yesterday I found and fixed a small issue in my breakaway support detection. It was working in github runners but not in the "real world" (the work-a-round was always used...) |
I'll have a look at the macos issue in #886 |
perfect & thanks! There is obviously an issue with the hostname ip resolution (I remember a similar problem is the windows docker container) |
I don't want to be annoying but is there a chance that we get my PR to a committable state within the next two weeks (afterwards I'm on holidays for 3 weeks...)? The new feature would also require some adaptions of the ipcluster docu. Should I write that? |
Thanks for your huge amount of work! I'm also going on holiday. If you could remove all the temporary skips of tests and things, and make appropriate changes to the docs. I can probably take it from there when I have a chance. |
You are very welcome! Your comments and the reviewing process was a positive experience for me. The only thing which is left, is the 'SSHProxy' support/test under Windows. I remember darkly that it was basically working on my machine but not reliable. Since my knowledge about the network communication of ipp is still limited, I didn't investigate the issue further. I'm willing to assist here as well, but in a later separate PR (since a normal ssh cluster works perferctly fine). I hope you find time to merge the PR soon. Enjoy your holidays! |
Finally merged, thank you! |
Thanks for improving and merging my code parts! Coming from C++ I can still see that I haven't fully adapted to all possibilities of dynamic programming languages ;-) I also have a completely different question: Is there a possibility to submit (a small amount of) auxiliary/custom data along with a task that will be stored in the central task database? Is there a better place to ask such questions? Thanks & Cheers |
I've been cleaning up a few things and fixing compatibility with the latest version of ipykernel and then I'll make a release. The only thing I have left is a docs PR that I'm working on, then I'll make a release. Hopefully this week or next. As for the other question, it would be better to open a new issue with more details. I'd need more information about what you want to do with the centrally stored data. |
Many thanks for the quick reply and the infos. |
@ottointhesky ipyparallel 9.0 is out your PR. Thanks, again! |
Perfect and thanks for the info! |
Due to the standard OpenSSH Windows package it is straight forward to setup ssh servers on Windows 10 and above (https://learn.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse?tabs=gui). Therefore, ipcluster should also support Windows (or even better mixed OS environments incl. MacOS) if possible.
So I decided to give it a try, to setup a minimal cluster of 2 Windows computers using ipcluster based on the ssh profile. And I was successful :-)
I’m happy to contribute necessary code changes if there is an interest from the community but it would require some discussions to achieve a proper implementation. As a start some details on my solution:
Beside some obvious command/script changes (e.g. export -> set, rm -> del, test –e -> if exist ….), the most tricky part was in starting the engines in a way, that they are not closed by Windows when closing the ssh connection (for Linux the nohup command is used – cygwin nohup didn’t work for me). The trick was to start the process with the ‘CREATE_BREAKAWAY_FROM_JOB’ flag (as mentioned in https://stackoverflow.com/questions/52449997/how-to-detach-python-child-process-on-windows-without-setsid). I have currently solved this by a separate nohup.py script but maybe there are better and easier ways to achieve that.
Also shutting down was more difficult than expected. ‘Taskkill’ returned an ‘Access denied’ error message (for whatever reason) but using Powershell and the Stop-Process command did work eventually.
Since shell commands will always (slightly) differ from OS to OS, there are basically two option to address the differences. All shell commands are changed to (small) python scripts which have OS independent calling pattern or the OS on the other side of ssh connection is known so that the correct command sequence can be sent. Both options have pros and cons: switching to python from the very start requires a working python instance on the engine computers which might make trouble-shooting / error message reporting more difficult, if python is not working (properly). Using OS dependent shell commands requires knowledge of the OS/shell in the first place (can be worked out by a command that works on all platforms e.g.: echo "OS-CMD=%OS%;OS-PW=$env:OS;OS-LIUNX=$OSTYPE;SHELL=$SHELL") and then storaging this information (adaption of internal data structure necessary?).
Looking forward to your comments!
The text was updated successfully, but these errors were encountered: