Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create page on basic code benchmarking, job optimization, and job scheduling (why job not run) #32

Open
craigsteffen opened this issue Mar 28, 2024 · 9 comments
Assignees
Labels
enhancement New documentation needed for an existing feature

Comments

@craigsteffen
Copy link
Collaborator

Create a landing page to talk new users through the general ideas of:

  • how to scale their codes up from one core to many on a node
  • how to scale their single-node jobs up to multi-node
  • how to benchmark their jobs to find out the optimal runtime configuration
  • how to use the above information to create their job structure
    • how to optimize that configuration to maximize their efficiency
@craigsteffen craigsteffen self-assigned this Mar 28, 2024
@pmenstrom
Copy link
Collaborator

This would be very helpful.

Definitely look at the documentation for other centers and see what we can glean from them. "Imitation is the sincerest form of flattery"

@craigsteffen
Copy link
Collaborator Author

Ok, I just realized that we're probably going to want to build a fairly solid page on the documentation hub that's generically about running jobs on slurm. I would guess a solid percentage of the content of that page will be from the Delta "running jobs" section, which is fairly extensive.

Basically we'll bring out all the slurm commands and stuff that isn't machine-specific, and just leave the sample job scripts, which are specific to the machine.

Since I have this task to start creating general job-related pages, I think it might make sense for the first step to be to create the overall running-jobs page on the hub, and then I'll populate the new pages, then we can start bringing the other pages into it as it's convenient.

I'll do it in a separate branch, because it's going to take time.

Any objections, @lhelms2 ?

@lhelms2
Copy link
Collaborator

lhelms2 commented Mar 29, 2024

@craigsteffen I already have a Slurm page, that is basically done but was just waiting for other change reviews to be completed first before I sent it in for reviews/approval. I would propose that that Slurm page be implemented, and then you can update it in a separate branch after that. If you'd like to discuss in a call, let me know.

@craigsteffen
Copy link
Collaborator Author

craigsteffen commented Mar 29, 2024

Status note: We had a talk about this. @lhelms2 is going to implement the new directory structure for a broken-up slurm/running jobs page, but keep it hidden, and then merge that into the main. That will take a week or so.

Then I can grab that, open a new branch in which I populate more pages, and then we'll work on getting that put into the public page, as we transfer stuff to it from the Delta page (and others).

@lhelms2 You can just ping me here when you're done with the invisible structure.

@craigsteffen
Copy link
Collaborator Author

@lhelms2 Peter and I talked today about getting a page up about how fair-share works, so that we can point users at it.

From the discussion here, it looks like I was waiting for you to implement some changes, and then I was going to start composing the "why my job not run?" page. You may have already made those structural changes? On the other hand, I see that the proposed_changes branch has a bunch of commits yesterday and today.

So we should chat before I start to work on that page. I'm away at a hackathon but I'll be back on Thursday.

cc @pmenstrom

@lhelms2
Copy link
Collaborator

lhelms2 commented May 14, 2024

@craigsteffen yes, those structural changes were made (I pinged you in Slack instead of here when I did that last month).

Yes, we should probably chat again before you start working on a new page to make sure it doesn't conflict with what I've been working on

(cross-ref issue #25 )

@lhelms2
Copy link
Collaborator

lhelms2 commented May 14, 2024

@craigsteffen I sent you a calendar invite for Thursday, if that time doesn't work, just let me know and I'll move it.

@lhelms2
Copy link
Collaborator

lhelms2 commented May 14, 2024

@craigsteffen I merged proposed_changes into main so you can go ahead and pull a new branch (please don't start in proposed_changes) and make the changes to common/slurm/... that you want. I'll stay out of the Slurm pages until they're ready for review to avoid any merging conflicts. (We can still meet on Thursday if you want, or just decline the invite.)

@craigsteffen
Copy link
Collaborator Author

@lhelms2 Right. I wouldn't start dropping changes directly into proposed changes (unless they were to concealed files like changed to .txt or whatever) for a big set of changes. Yeah, I'll do my work in another branch and then we can review and merge later.

I accepted the Thursday meeting. Let's keep it, even if it's two minutes of "yep, that's what I was thinking too, ok, cool". That's valuable.

@lhelms2 lhelms2 added improvement Improvements to the documentation enhancement New documentation needed for an existing feature and removed improvement Improvements to the documentation labels Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New documentation needed for an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants