Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance user experience around cloud storage via better integration with jupyterlab and notebook server environment #10

Open
4 tasks
rabernat opened this issue Apr 26, 2022 · 5 comments

Comments

@rabernat
Copy link

rabernat commented Apr 26, 2022

Context

In 2i2c-org/docs#138 we started to write some user-facing documentation around how to work with storage in the cloud. This now appears at https://docs.2i2c.org/en/latest/user/storage.html. Some relevant tidbits

Your hub lives in the cloud. The preferred way to store data in the cloud is using cloud object storage, such as Amazon S3 or Google Cloud Storage.
...
From a user perspective, the main challenge of working with object storage is the need to use more specialized tools, rather than just simple files / filenames, to manage data.

In #9 we are tracking the idea that hub admins should be able to create cloud storage buckets for hub users, possibly with group-level credentials.

In this issue, I am proposing several UI / UX enhancements that will empower users to take better advantage of cloud storage. The impact of this will be to make our users more effective "cloud native" data scientists.

Proposal

We should do the following:

Updates and actions

No response

@rabernat
Copy link
Author

rabernat commented Jun 9, 2022

Now that pangeo-data/pangeo-docker-images#310 is merged, we could be using Pangeo images with the s3 browser installed.

@sgibson91
Copy link
Member

@rabernat what's the tag for that image? I just enabled an action that will auto-bump the pangeo images for the three pangeo-like hubs. It creates PRs like this: 2i2c-org/infrastructure#1407

@rabernat
Copy link
Author

rabernat commented Jun 9, 2022

what's the tag for that image?

There hasn't been a release yet. The usually happen about once a week.

I just enabled an action that will auto-bump the pangeo images for the three pangeo-like hubs

It's great to see work happening on this important topic! 🚀 However, I have mixed feelings about the idea of automatically updating the image. The stack changes fast enough that this can break user code, leading to serious frustration. In my experience, users absolutely hate it when code that was working one day stops working the next, for reasons that are not the user's fault. This has definitely happened in the past when I manually updated the image.

To balance the desire to be able to use the latest image with the need to keep code reproducible, I think it is crucial that the Pangeo hubs have the ability to allow the users to select any of the past images from the spawner. This would mitigate the problem of breaking user code. Without such a feature, I would have to vote NO on automatically updating the images. Even better would be moving in the "binder for everything" direction, where we completely decouple the image from the profiles, and force the user to always explicitly specify an image.

Is there an issue where we can discuss this specifically?

@sgibson91
Copy link
Member

@sgibson91
Copy link
Member

sgibson91 commented Jun 9, 2022

I think the work on the list is ongoing in 2i2c-org/infrastructure#1253, but this action workflow will be useful for things beyond pangeo images, e.g., it can replace this kind of manual PR too 2i2c-org/infrastructure#1403 (for minor releases that don't need as much babysitting). We will also use it to keep the version of repo2docker that a BinderHub will use up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Needs Shaping / Refinement
Development

No branches or pull requests

2 participants