Skip to content

Commit

Permalink
Add entries for DRAC cluster during mila init [MT-61] (#54)
Browse files Browse the repository at this point in the history
* Add SSH entries for the DRAC clusters

Signed-off-by: Fabrice Normandin <[email protected]>

* Update tests for mila init to add DRAC setup

Signed-off-by: Fabrice Normandin <[email protected]>

* Also setup the passwordless authentication to DRAC

Signed-off-by: Fabrice Normandin <[email protected]>

* Bug: Unable to check passwordless SSH on DRAC

Signed-off-by: Fabrice Normandin <[email protected]>

* Re-add the "User" in DRAC entries

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix isort issue

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test broken because of additional prompts

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify the code a bit, show entries in dict

Signed-off-by: Fabrice Normandin <[email protected]>

* `mila init` sets up SSH access to DRAC

Signed-off-by: Fabrice Normandin <[email protected]>

* Add explicit `timeout` to `run` method, fix tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix ordering of entries in windows config from WSL

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix pre-commit issues

Signed-off-by: Fabrice Normandin <[email protected]>

* Tweak ssh-keygen call and test timeout

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in `create_ssh_keypair`

Signed-off-by: Fabrice Normandin <[email protected]>

* Dont add SSH multiplexing for DRAC compute nodes

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with create_ssh_keypair on Windows

Signed-off-by: Fabrice Normandin <[email protected]>

* Try to make the ssh-keygen work on Windows in CI

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue(?) with ssh-keygen on Windows

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix weird issues with test_local.py on Windows

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test for create_ssh_keypair on Windows

Signed-off-by: Fabrice <[email protected]>

* Simplify redundant use of xfail mark

Signed-off-by: Fabrice Normandin <[email protected]>

* Move test_check_passwordless to test_local.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Move init command steps to init_command.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify check_passwordless and add more tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add link to DRAC website for passwordless SSH

Signed-off-by: Fabrice Normandin <[email protected]>

* Greatly simplify check_passwordless

Signed-off-by: Fabrice Normandin <[email protected]>

* Update `mila init`, setup passworless SSH to DRAC

Signed-off-by: Fabrice Normandin <[email protected]>

* Add failing test stub

Signed-off-by: Fabrice Normandin <[email protected]>

* Add tests for setup_passwordless_ssh_to_cluster

Signed-off-by: Fabrice Normandin <[email protected]>

* Move common fixtures to common.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a test for _get_drac_username

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with shutil.copytree for py3.7

Signed-off-by: Fabrice Normandin <[email protected]>

* Add integration test for setup_passwordless_ssh

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove outdated comments

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with check_passwordless, fix test

Signed-off-by: Fabrice Normandin <[email protected]>

* Add comments in test

Signed-off-by: Fabrice Normandin <[email protected]>

* Replace computecanada.ca with alliancecan.ca

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in test_setup_passwordless_ssh_access

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase timeout value to try to help Win issue

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase timeout value for a test

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase timeout for test_create_ssh_keypair

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with setup_passwordless_ssh on Windows

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove the check for ssh access to niagara

Signed-off-by: Fabrice Normandin <[email protected]>

* Make an ssh key for Mila and Drac clusters

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix tests for setup_passwordless_ssh

Signed-off-by: Fabrice Normandin <[email protected]>

* Revert "Fix tests for setup_passwordless_ssh"

This reverts commit 6ea36d4.

* Revert "Make an ssh key for Mila and Drac clusters"

This reverts commit 61ec17e.

* Increase the timeout value in test_init_command.py

Signed-off-by: Fabrice Normandin <[email protected]>

---------

Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice Normandin <[email protected]>
Signed-off-by: Fabrice <[email protected]>
  • Loading branch information
lebrice authored Jan 23, 2024
1 parent 9c4f213 commit 4adedbc
Show file tree
Hide file tree
Showing 49 changed files with 3,131 additions and 916 deletions.
120 changes: 9 additions & 111 deletions milatools/cli/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@
from urllib.parse import urlencode

import questionary as qn
from invoke.exceptions import UnexpectedExit
from typing_extensions import TypedDict

from ..version import version as mversion
from .init_command import (
create_ssh_keypair,
print_welcome_message,
setup_keys_on_login_node,
setup_passwordless_ssh_access,
setup_ssh_config,
setup_vscode_settings,
setup_windows_ssh_config_from_wsl,
Expand All @@ -47,13 +48,13 @@
randname,
running_inside_WSL,
with_control_file,
yn,
)

logger = get_logger(__name__)
if typing.TYPE_CHECKING:
from typing_extensions import Unpack

logger = get_logger(__name__)


def main():
if sys.platform != "win32" and get_fully_qualified_name().endswith(
Expand Down Expand Up @@ -139,12 +140,12 @@ def mila():
intranet_parser.set_defaults(function=intranet)

# ----- mila init ------

init_parser = subparsers.add_parser(
"init",
help="Set up your configuration and credentials.",
formatter_class=SortingHelpFormatter,
)

init_parser.set_defaults(function=init)

# ----- mila forward ------
Expand Down Expand Up @@ -396,7 +397,6 @@ def init():
print("Checking ssh config")

ssh_config = setup_ssh_config()
print("# OK")

# if we're running on WSL, we actually just copy the id_rsa + id_rsa.pub and the
# ~/.ssh/config to the Windows ssh directory (taking care to remove the
Expand All @@ -405,116 +405,14 @@ def init():
if running_inside_WSL():
setup_windows_ssh_config_from_wsl(linux_ssh_config=ssh_config)

setup_passwordless_ssh_access()
success = setup_passwordless_ssh_access(ssh_config=ssh_config)
if not success:
exit()
setup_keys_on_login_node()
setup_vscode_settings()
print_welcome_message()


def setup_passwordless_ssh_access():
print("Checking passwordless authentication")

here = Local()

# Check that there is an id file
ssh_private_key_path = Path.home() / ".ssh" / "id_rsa"

sshdir = os.path.expanduser("~/.ssh")
if not any(
entry.startswith("id") and entry.endswith(".pub")
for entry in os.listdir(sshdir)
):
if yn("You have no public keys. Generate one?"):
# Run ssh-keygen with the given location and no passphrase.
create_ssh_keypair(ssh_private_key_path, here)
else:
exit("No public keys.")

# Check that it is possible to connect using the key

if not here.check_passwordless("mila"):
if yn(
"Your public key does not appear be registered on the cluster. Register it?"
):
# NOTE: If we're on a Windows machine, we do something different here:
if sys.platform == "win32":
command = (
"powershell.exe type $env:USERPROFILE\\.ssh\\id_rsa.pub | ssh mila "
'"cat >> ~/.ssh/authorized_keys"'
)
here.run(command)
else:
here.run("ssh-copy-id", "mila")
if not here.check_passwordless("mila"):
exit("ssh-copy-id appears to have failed")
else:
exit("No passwordless login.")


def setup_keys_on_login_node():
print("Checking connection to compute nodes")

remote = Remote("mila")
try:
pubkeys = remote.get_lines("ls -t ~/.ssh/id*.pub")
print("# OK")
except UnexpectedExit:
print("# MISSING")
if yn("You have no public keys on the login node. Generate them?"):
# print("(Note: You can just press Enter 3x to accept the defaults)")
# _, keyfile = remote.extract(
# "ssh-keygen",
# pattern="Your public key has been saved in ([^ ]+)",
# wait=True,
# )
private_file = "~/.ssh/id_rsa"
remote.run(f'ssh-keygen -q -t rsa -N "" -f {private_file}')
pubkeys = [f"{private_file}.pub"]
else:
exit("Cannot proceed because there is no public key")

common = remote.with_bash().get_output(
"comm -12 <(sort ~/.ssh/authorized_keys) <(sort ~/.ssh/*.pub)"
)
if common:
print("# OK")
else:
print("# MISSING")
if yn(
"To connect to a compute node from a login node you need one id_*.pub to "
"be in authorized_keys. Do it?"
):
pubkey = pubkeys[0]
remote.run(f"cat {pubkey} >> ~/.ssh/authorized_keys")
else:
exit("You will not be able to SSH to a compute node")


def print_welcome_message():
print(T.bold_cyan("=" * 60))
print(T.bold_cyan("Congrats! You are now ready to start working on the cluster!"))
print(T.bold_cyan("=" * 60))
print(T.bold("To connect to a login node:"))
print(" ssh mila")
print(T.bold("To allocate and connect to a compute node:"))
print(" ssh mila-cpu")
print(T.bold("To open a directory on the cluster with VSCode:"))
print(" mila code path/to/code/on/cluster")
print(T.bold("Same as above, but allocate 1 GPU, 4 CPUs, 32G of RAM:"))
print(" mila code path/to/code/on/cluster --alloc --gres=gpu:1 --mem=32G -c 4")
print()
print(
"For more information, read the milatools documentation at",
T.bold_cyan("https://github.com/mila-iqia/milatools"),
"or run `mila --help`.",
"Also make sure you read the Mila cluster documentation at",
T.bold_cyan("https://docs.mila.quebec/"),
"and join the",
T.bold_green("#mila-cluster"),
"channel on Slack.",
)


def forward(
remote: str,
page: str | None,
Expand Down
Loading

0 comments on commit 4adedbc

Please sign in to comment.