Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVENT] Openscapes workshop February 2023 #2092

Closed
9 tasks
jmunroe opened this issue Jan 25, 2023 · 8 comments
Closed
9 tasks

[EVENT] Openscapes workshop February 2023 #2092

jmunroe opened this issue Jan 25, 2023 · 8 comments
Assignees

Comments

@jmunroe
Copy link
Contributor

jmunroe commented Jan 25, 2023

Summary

FreshDesk ticket: https://2i2c.freshdesk.com/a/tickets/382

We use the Openscapes 2i2c instance and are expecting to host between 150-300 people during a workshop this February. We wanted to reach out in advance and check if there was anything specific we needed to do in order to accommodate a lot of people on large python instances (~17GB) for two hours on both February 10 and 17th from 1-3pm eastern. Specifically, we are wondering if there will be enough quota for everyone.

Event Info

  • Community Representative: Brianna Lind
  • Event begin: February 10 and February 17
    • In your timezone: 1 pm ET
  • Event end: February 10 and February 17
    • In your timezone: 3 pm ET
  • Active times:
    • In your timezone:
  • Number of attendees: 150-300 people
  • Hub Events Calendar

Hub info

Task List

Before the event

  • Dates confirmed with the community representative and added to Hub Events Calendar.
  • Quotas from the cloud provider are high-enough to handle expected usage.
  • One week before event Hub is running.
  • Confirm with Community Representative that their workflows function as expected.
    • 👉Template message to send to community representative
      Hey {{ COMMUNITY REPRESENTATIVE }}, the date of your event is getting close!
      
      Could you please confirm that your hub environment is ready-to-go, and matches your hub's infrastructure setup, by ensuring the following things:
      - [ ] Confirm that the "Event Info" above is correct
      - [ ] On your hub: log-in and authentication works as-expected
      - [ ] `nbgitpuller` links you intend to use resolve properly
      - [ ] Your notebooks and content run as-expected
      
  • 1 day before event, either a separate nodegroup is provisioned for the event or the cluster is scaled up.

During and after event

  • Confirm event is finished.
  • Nodegroup created for the hub is decommissioned / cluster is scaled down.
  • Hub decommissioned (if needed).
  • Debrief with community representative.
    • 👉Template debrief to send to community representative
      Hey {{ COMMUNITY REPRESENTATIVE }}, your event appears to be over 🎉
      
      We hope that your hub worked out well for you! We are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!
      
      - Did the infrastructure behave as expected?
      - Anything that was confusing or could be improved?
      - Any extra functionality you wish you would have had?
      - Could you share a story about how you used the hub?
      
      - Any other feedback that you'd like to share?
      
      
@jmunroe
Copy link
Contributor Author

jmunroe commented Jan 25, 2023

Would someone from @2i2c-org/engineering confirm that sufficient quota will be available for this event?

@consideRatio
Copy link
Member

consideRatio commented Feb 1, 2023

The quota for openscapes is 640 CPUs currently. I suggest the following, in line with discussions in #2121.

  • We provide a new machine type, the r5.16xlarge with 64 CPU cores and 512 GB of Memory.
  • We configure a new choice among the server options, and make it the default and one listed in the top during this event.
  • For each of these machines, we can fit:
    • 30 users guaranteed 17 GB of memory
    • 40 users guaranteed 12.8 GB of memory
    • 50 users guaranteed 10.2 GB of memory
  • Given current quotas, I think we can get 9 of these machines started.

I made a practical suggestion in https://2i2c.freshdesk.com/a/tickets/382. 32 users per node, 64 core / 512 GB r5.16xlarge machines, for 2CPU and 16GB of memory per user.

@BriannaLind
Copy link
Contributor

So as things are now, we know the that script/data gets up 17GB...... so 16GB does seem to be cutting it a bit close.

@damianavila
Copy link
Contributor

Hey @BriannaLind, your event appears to be over 🎉

We hope that your hub worked out well for you! We are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!

  • Did the infrastructure behave as expected?
  • Anything that was confusing or could be improved?
  • Any extra functionality you wish you would have had?
  • Could you share a story about how you used the hub?
  • Any other feedback that you'd like to share?

Thanks!

@colliand
Copy link
Contributor

I write to follow up on the request from @damianavila above. Hi @BriannaLind! Can you please share feedback on how the hub worked for the Openscapes event in February? Thanks in advance.

@BriannaLind
Copy link
Contributor

Hiya @colliand - sorry for the delay. Yes! We were glad to have started coordinating early with you guys to plan our multiday workshop in advance. That really helped make sure the resources we needed were available.

Did the infrastructure behave as expected?

  • Day 1: Yes, we were thrilled! We were able to add nearly > 100 participants in ~10 minutes!
  • Day 2: No. We had substantial trouble shooting/ challenges to overcome with the instance

Anything that was confusing or could be improved?

  • it was fairly challenging to monitor resources as they were being consumed
  • while 2i2c provided active support on Day1, when we ran into some trouble on Day2 we had to rely on professional connections to resolve the problem as best we could.
  • it turned out that although all of the specs worked perfectly on Day1, but by Day2 during the workshop something was altered (I don't know how/when/by whom) and instances were spinning up with 1% of the requested GB which resulted in nearly immediate failure of scripts for most people in the workshop.

Any extra functionality you wish you would have had?

  • It would be nice to see exactly how much processing power is being used as a script is being run. Even better if this information was displayed as a portion of total available.
  • It would be nice to have more 'direct' elastic capacity...... I thought one of the major benefits of working in a cloud environment was automatic resource scaling, but that doesn't seem to be as easy as I expected. If I have the capacity (I suppose funds available) I would like my environment to instantly scale to the resource capacity I need to execute a script.
  • I would like an automated way to understand how many resources a single script will require and the number of nodes needed to support the use of a script by X number of people simultaneously.

Could you share a story about how you used the hub?

  • Absolutely! Despite the challenges we had, there is incredible value to avoiding setup environment hassles and diving right into scientific content together.

  • Any other feedback that you'd like to share?

  • Thank you for you and your colleague's efforts to make our workshop series a success!

@BriannaLind
Copy link
Contributor

@amfriesz anything else to add?

@damianavila
Copy link
Contributor

Closing this one, thanks for the feedback!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants