Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enable K8 Job Execution #320

Closed
wants to merge 8 commits into from
Closed

Conversation

jmchilton
Copy link
Collaborator

#313 enables running SLURM jobs in Kubernetes using this project and Kompose. This work builds on that but run the jobs using the Kubernetes Jobs' API if containers are available (and uses the local job runner otherwise). This leverages the Kubernetes job runner from the @pcm32 (xref galaxyproject/galaxy#2314).

I'm opening this WIP PR mostly just as documentation that this is possible - since we don't really have a way to extend the compose syntax for all the different things we want to compose yet - so I don't actually want to change the docker-compose.yml file that is changed in this PR. The default compose setup should remain Condor for now I think. I am convinced at least that reuse in building containers using ansible-galaxy-extras and such is possible - and that we may even want to move the level of reuse up and figure out how to utilize docker-compose in different ways.

- Use volumes_from instead of volumes so kompose can match up paths properly.
- Mount pgadmin4 without volumes when using kompose for now.
- Annotate the services that should be exposed externally with kompose labels in the source compose files.
Relevant pieces of SLURM docs:

```
Node names can have up to three name specifications: NodeName is the name used by all Slurm tools when referring to the node, NodeAddr is the name or IP address Slurm uses to communicate with the node, and NodeHostname is the name returned by the command /bin/hostname -s. Only NodeName is required (the others default to the same name), although supporting all three parameters provides complete control over naming and addressing the nodes. See the slurm.conf man page for details on all configuration parameters.
```

```
ControlAddr
Name that ControlMachine should be referred to in establishing a communications path. This name will be used as an argument to the gethostbyname() function for identification. For example, "elx0000" might be used to designate the Ethernet address for node "lx0000". By default the ControlAddr will be identical in value to ControlMachine.
```
Add TODO to tweak NodeAddr in the Kubernetes case for Kubernetes case.
@bgruening
Copy link
Owner

bgruening commented May 10, 2017

@jmchilton let me know if you want me to take this over. I think this work is awesome and we should get it in and announce it.

I have done some work in dev on compose, htcondor and slurm and started an initial documentation:

https://github.com/bgruening/docker-galaxy-stable/blob/condor_tests/compose/README.md

Please have a especially a look at these env files. My currently plan is to provide such files for different recommended deployments. It would be great if we can create such a file for the case where Galaxy submits into a k8s cluster - very similar to the SLURM and HTCondor case.

Does this seem to be a good idea for you? Changing a deployment would become
ln -sf .env_htcondor_docker .env.

Travis should also have a k8s cluster running already. I have no clue if this thing is working :) But it was intended for you to test this PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants