Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry Entry that Points to a Local SIF File #519

Closed
georgiastuart opened this issue Mar 17, 2022 · 19 comments · Fixed by #520
Closed

Registry Entry that Points to a Local SIF File #519

georgiastuart opened this issue Mar 17, 2022 · 19 comments · Fixed by #520

Comments

@georgiastuart
Copy link
Contributor

I've got a few SIFs that I want to make available to our cluster users via shpc. However, these SIFs are locally stored only (for the time being, hopefully we'll have a gitlab docker registry up and running soon!).

I'm reading the registry entry writing documentation and I'd like to be able to define aliases, GPU resources, etc for these local sifs as you could in a registry recipe for a remote image. Is that possible?

I've messed with shpc add and it seems like it creates just a default setup / module file without the finer control that you get from a registry recipe, is that correct?

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

Love this idea! Okay so let me walk through my thinking about design before jumping into an implementation. So a current container.yaml might look like this:

docker: vanessa/salad
url: https://hub.docker.com/r/vanessa/salad
maintainer: '@vsoch'
description: A container all about fork and spoon puns.
latest:
  latest: sha256:e8302da47e3200915c1d3a9406d9446f04da7244e4995b7135afd2b79d4f63db
tags:
  latest: sha256:e8302da47e3200915c1d3a9406d9446f04da7244e4995b7135afd2b79d4f63db
aliases:
  salad: /code/salad

I would normally have this stored under registry/vanessa/salad/container.yaml and install as a module like:

$ shpc install vanessa/salad

Now let's pretend that I've instead pulled this container, and I have it as a sif, salad_latest.sif. And it's also not in a registry so I cannot pull again (your use case). Right now, if you do:

$ shpc add salad_latest.sif vanessa/salad:latest

You'd get a module, but not with customization (e.g., just basic commands to shell / exec.)

Proposal

So my proposal is to update shpc add to instead of doing:

container -> add -> module

to instead do:

container -> add -> registry -> install -> module

This means that shpc add <container.sif> <namespace> would actually generate for you a container.yaml in your registry (and thus help you honor maintaining one namespace to tell you about name conflicts) and it would provide a base container.yaml that you could tweak to your liking, with instruction to install from a path. So for example you might first generate:

path: sha256:e8302da47e3200915c1d3a9406d9446f04da7244e4995b7135afd2b79d4f63db.sif
url: https://singularity-hpc.readthedocs.io
maintainer: '@vsoch'
description: A custom container for X. Add your description here for container users to see.
latest:
  latest: sha256:e8302da47e3200915c1d3a9406d9446f04da7244e4995b7135afd2b79d4f63db
tags:
  latest: sha256:e8302da47e3200915c1d3a9406d9446f04da7244e4995b7135afd2b79d4f63db
# aliases:
#  echo: /usr/bin/echo

The above is nice because shpc can still calculate the digest for your container from the SIF directly, and then plop that into a latest (that you can either keep or change the tag for) and the container is named to match that. You'll also notice the path is the sha, and we do that to easily match it to latest. If you ask to install an old tag that isn't the current path, it will look for it in the same install directory as the container, and only error if it cannot find it. So if the container sif gets deleted from your registry folder, you wouldn't be able to install that exact container again (logically), and this would be up to you to keep/remove and tidy up the recipe. You'd also likely edit the url / description and uncomment variables that you want (e.g., the list of aliases or gpu for your use case). And then when you are happy with the recipe and want to make the module:

$ shpc install vanessa/salad

Just as you would another standard recipe! And then let's say you have a new version of your container, you'd again do:

$ shpc add salad_latest.sif vanessa/salad:2.1

And it would either add the new SIF and version if it's not present, or if you are specifying to overwrite an existing version, it would ask you to confirm first. There are two things I'm not sure about (but I have an opinion, and let me know what you think)!

  1. should we always have the latest tag be updated to be the last one you added (e.g., the newest). My thinking is yes because that's sort of what the automated updater does.
  2. storing the container TWICE - once with the module and once with the container directory - is kind of redundant. But the issue is that not storing the container somewhere that can be easily deleted might risk losing them. So we could either maintain a separate install tree for these custom containers, OR assume the installing user has write to the install directory and put them alongside the registry, OR perhaps do a link (and I'm not sure if this would lead to funky issues with the modules) OR something entirely different I'm not thinking of!

Let me know your thoughts! Pinging @marcodelapierre because he always has good ideas and feedback too :)

  1. And of course, you'd do shpc install again to create the new module.

Let me know your thoughts on the above and if I'm missing anything!

@georgiastuart
Copy link
Contributor Author

Yes! That's exactly the sort of behavior I'd want. I think having shpc add ... create a registry document that can be edited before shpc install ... would be super helpful, especially since I've got things I want to tweak in the registry for these sifs before our users load them to reduce friction (like adding the GPU specifier to GPU-expecting containers, adding aliases, etc).

We've got several sifs that were built specifically for our systems (mostly with MPI stacks that behave nicely with our MPI stacks, etc).

For your two questions/thoughts at the end:

  1. Yes, I think latest ought to be updated. I think this is most in line with a typical registry.
  2. For my use case, I think having the container alongside the registry would be fine. For our cluster, only sysadmins will be able to install containers to the "central" location and users can use the ones we provide (or set up their own shpc system / use containers in the traditional manner!)

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

Thank you! Will start on this asap.

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

okay all set! #520 Please take your time to review, I only work on personal projects in evenings and on weekends so we have a good 24 hours at least. 😆

@georgiastuart
Copy link
Contributor Author

This is amazing! I'll get it on one of our clusters today to try out the workflow and let you know any pain points I encounter. I'm excited to get it up and running!

@georgiastuart
Copy link
Contributor Author

Love it!!

I did run into an issue I think:

When you initially create the .yml with shpc add, this is what's created:

path: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714.sif

# change this to a URL to describe your container, or to get help
url: https://singularity-hpc.readthedocs.io

# change this to your GitHub alias (or another contact or name)
maintainer: Dinosaur

# A description to describe your container
description: A custom container to do X.
latest:
  1.2.gpu: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714
tags:
  latest:

# Any custom features?
# features:
#  gpu: true

# Put custom aliases here
# aliases:
#  echo: /usr/bin/echo

# custom environment variables
# env:
#  breakfast: pancakes
  1.2.gpu: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714

The tag gets pushed to the bottom rather than remaining with tags and the latest tag does not have a sha. I consulted the docs and updated the yml to

path: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714.sif

# change this to a URL to describe your container, or to get help
url: <URL>

# change this to your GitHub alias (or another contact or name)
maintainer: <maintainer>

# A description to describe your container
description: <description>
latest:
  1.2.gpu: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714
tags:
  1.2.gpu: sha256:d252a813426a261e14a2a0d48e9313acd9bee2c75da01ad22c301c4e068d0714
# Any custom features?
features:
  gpu: true

# Put custom aliases here
# aliases:
#  echo: /usr/bin/echo

# custom environment variables
# env:
#  breakfast: pancakes

(moved the tag to the correct location and removed the latest: field from tags).

Otherwise, it works like a charm! Figuring out MPI stuff now with wrappers :)

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

Oh that's so weird! It must be the yaml parser I use, which honors preserving comments. I'll take a look this evening and fix it up - for the time being your fix is totally spot on!

@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

okay I think I've fixed the bug!

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Mar 18, 2022

Hi @vsoch , @georgiastuart , thanks for pinging, and sorry for the delay.
I have a couple of thoughts, for your use/edit/non-use, as you prefer.

One thing I was thinking myself is related to this issue: the ability to create starting container.yaml for remote containers, just to save some typing and monkey job. Now...

One of the many interesting things I see in the proposal above is that in the yaml you can now choose to specify path instead of url, to get a local container.
How about enabling the SHPC client to always take the option of a local path for a container? In other words, the minimal edit compared to the current functionalities would be to enable people to write a new container.yaml from scratch, but...for a local SIF.

And then, for the 2nd part, that is already described above. Now that the SHPC client, and crucially shpc install, is able to take a local path for a SIF, why not changing the purpose of shpc add completely, to turn it into a command that will generate a starting container.yaml for you? So, what I am saying is, in addition to the case above, where it does it for a local SIF, it could also do it for a remote one.

Then, for any containers, regardless whether remote or local, one has two options:
A) if container.yaml is already written (either with url or path), just go with shpc install
B) if no container.yaml exists yet, run shpc add first, to generate a convenient starting version for it

Apologies if this adds nothing to the above, or if it makes little sense....it's already Friday evening here :-D
Happy to chat further on this anyway, just ping me!

@vsoch
Copy link
Member

vsoch commented Mar 18, 2022

@marcodelapierre I love that! So we could do either:

$ shpc add salad_latest.sif vanessa/salad:2.1

OR

$ shpc add docker://vanessa/salad:latest

To add vanessa/salad on Docker Hub. Should the user be able to control the namespace for the latter? My git says no - this gets messy, but theoretically there could be reasons someone would want to change it (albeit it would get confusing). It could be we allow either

$ shpc add docker://vanessa/salad:latest
$ shpc add docker://vanessa/salad:latest dinosaur/salad:latest

@georgiastuart
Copy link
Contributor Author

I would love to be able to change the namespace on pulled containers, but I can also see the argument for not doing it.

My vote is for the options

$ shpc add docker://vanessa/salad:latest
$ shpc add docker://vanessa/salad:latest dinosaur/salad:latest

I'll pull your bug fix today and try it out on another container module install!

@vsoch
Copy link
Member

vsoch commented Mar 18, 2022

Sounds good! Likely I’ll do the larger update to the PR at the end of the day.

Adding this functionality will also require getting tags and digests from a registry, so it sets us up nicely to do some kind of update command (which we were discussing in #501 I think).

@vsoch
Copy link
Member

vsoch commented Mar 18, 2022

okay creation from docker:// URI is added to the PR! #520. Review at your leisure, I'll be around :) Have a good weekend y'all!

@marcodelapierre
Copy link
Contributor

Amazing, thanks both!

@georgiastuart
Copy link
Contributor Author

When I attempt this command:

shpc add docker://tensorflow/tensorflow:2.7.1-gpu tensorflow:2.7.1-gpu

I get this result:

# change this to a URL to describe your container, or to get help
url: https://singularity-hpc.readthedocs.io

# change this to your GitHub alias (or another contact or name)
maintainer: Dinosaur

# Any custom features?
# features:
#  gpu: true

# Put custom aliases here
# aliases:
#  echo: /usr/bin/echo

# custom environment variables
# env:
#  breakfast: pancakes

# A description to describe your container
description: A custom container to do X.
latest:
  2.7.1-gpu: 'crane digest tensorflow/tensorflow:2.7.1-gpu:2.7.1-gpu: parsing reference
    "tensorflow/tensorflow:2.7.1-gpu:2.7.1-gpu": could not parse reference'
tags:
  2.7.1-gpu: 'crane digest tensorflow/tensorflow:2.7.1-gpu:2.7.1-gpu: parsing reference
    "tensorflow/tensorflow:2.7.1-gpu:2.7.1-gpu": could not parse reference'
docker: tensorflow/tensorflow:2.7.1-gpu

Same for without the namespace change:

shpc add docker://tensorflow/tensorflow:2.7.1-gpu

vsoch added a commit that referenced this issue Mar 21, 2022
@vsoch
Copy link
Member

vsoch commented Mar 21, 2022

okay should be fixed! It's almost like checking for an error in the response is a good idea (oops for me lol)

@georgiastuart
Copy link
Contributor Author

That fixed the add! Now I'm running into the following error with the install:

$ vim registry/tensorflow/container.yaml
$ shpc install tensorflow:2.7.1-gpu
singularity pull --name /opt/ohpc/pub/unpackaged/shpc/containers/tensorflow/tensorflow/2.7.1-gpu/2.7.1-gpu/tensorflow-tensorflow:2.7.1-gpu-2.7.1-gpu-sha256:581575fc3a736398f0dff9e950f57f2e6d808296267ac98325451a0b1d101dd0.sif docker://tensorflow/tensorflow:2.7.1-gpu@sha256:581575fc3a736398f0dff9e950f57f2e6d808296267ac98325451a0b1d101dd0
FATAL:   While making image from oci registry: error fetching image to cache: failed to get checksum for docker://tensorflow/tensorflow:2.7.1-gpu@sha256:581575fc3a736398f0dff9e950f57f2e6d808296267ac98325451a0b1d101dd0: unable to parse image name docker://tensorflow/tensorflow:2.7.1-gpu@sha256:581575fc3a736398f0dff9e950f57f2e6d808296267ac98325451a0b1d101dd0: Docker references with both a tag and digest are currently not supported

It looks like it's duplicating the tag. I think that, in the registry file, the docker identifier should not contain the tag.

Here's what it generates:

docker: tensorflow/tensorflow:2.7.1-gpu

Here's what ended up working:

docker: tensorflow/tensorflow

Hope this helps!

@vsoch
Copy link
Member

vsoch commented Mar 21, 2022

oops! So that was actually the same bug (one level up!) should be fixed now.

@marcodelapierre
Copy link
Contributor

seems like some great progress here! :)
not sure when I get the chance to test it myself, but I understand that Georgia has done it already.

I really like this new feature addition

vsoch added a commit that referenced this issue Mar 22, 2022
)

* refactor of "add" to generate a container.yaml first to close #519
* this set of changes re-organizes the container template to be under the container
module, and also moves around some code in modules and container (base.py and config.py)
because in development I find the current locations not intuititve enough.
* add should only support maintaining the namespace for the docker:// uri
Signed-off-by: vsoch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants