Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate manifests #3547

Merged
merged 3 commits into from
Jan 29, 2025
Merged

Consolidate manifests #3547

merged 3 commits into from
Jan 29, 2025

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Jan 27, 2025

Manifests.toml are a burden to maintain and add a lot of git diff noise. The one benefit of Manifests.toml is that they help with tracking version dependencies.

However:

  • For docs, the precise version of dependencies is not important, there is pretty much no version-dependent code being executed there
  • perf and examples can be consolidated into a single .buildkite environment. This environment only has to have one manifest checked, the one for which buildkite is run (and not one for each version of Julia).
  • The perf environment can be removed all together. It is a more niche environment: if reproducing a specific result is required, developers can use the .buildkite Manifest. If not, developers can also consider keeping the relevant tools (mainly JET) in the base environment.

This is the same pattern we have in other repos such as ClimaCore.

(Builds off #3546)

Closes #3548

@Sbozzolo Sbozzolo changed the title Gb/manifests Consolidate manifests Jan 27, 2025
@Sbozzolo Sbozzolo force-pushed the gb/manifests branch 2 times, most recently from 636928b to 206c33e Compare January 27, 2025 20:06
@Sbozzolo Sbozzolo marked this pull request as ready for review January 27, 2025 23:24
Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry about the merge conflict, I thought I had merged this before #3551, but I was mistaken.

This looks good to me, feel free to merge after rebasing, and thanks for updating this!

Manifests.toml are a burden to maintain and add a lot of git diff noise.
The one benefit of Manifests.toml is that they help with tracking
version dependencies.

However:
- For `docs`, the precise version of dependencies is not important,
there is pretty much no version-dependent code being executed there
- `perf` and `examples` can be consolidated into a single `.buildkite`
environment. This environment only has to have one manifest checked, the
one for which buildkite is run (and not one for each version of Julia).
- The `perf` environment can be removed all together. It is a more niche
environment: if reproducing a specific result is required, developers
can use the `.buildkite` Manifest. If not, developers can also consider
keeping the relevant tools (mainly JET) in the base environment.

This is the same pattern we have in other repos such as ClimaCore.
@charleskawczynski
Copy link
Member

I went ahead and rebased!

@Sbozzolo
Copy link
Member Author

Includes #3548

@Sbozzolo Sbozzolo force-pushed the gb/manifests branch 3 times, most recently from 3529619 to 3932f19 Compare January 28, 2025 18:25
@Sbozzolo Sbozzolo force-pushed the gb/manifests branch 3 times, most recently from 5bb811c to e97cee6 Compare January 28, 2025 22:29
@Sbozzolo Sbozzolo mentioned this pull request Jan 28, 2025
1 task
@Sbozzolo Sbozzolo force-pushed the gb/manifests branch 2 times, most recently from d2646d2 to 499f419 Compare January 29, 2025 00:40
@@ -1,16 +1,17 @@
moist: "equil"
apply_limiter: true
z_elem: 25
z_elem: 11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because this job failed: https://buildkite.com/clima/climaatmos-ci/builds/22496#0194af81-d116-4343-92f6-9b753e5763aa, which is likely related to the climatimestepper update.

You changed the wrong config file though I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I think it might be better to just increase the mem request

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using low resolution is not really meaningful for performance monitoring jobs, so I think it'd be better to remove it instead of lowering the resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me. I prefer to make it soft fail though so we will remember to come back to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I think it might be better to just increase the mem request

The memory is GPU memory, so we would need to request a different type of GPU, which would increase the queue time significantly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to a soft fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think there is something wrong, though. The fieldvector is being allocated here, and it's definitely not that large, so I don't know why we're OOMing. Maybe the GPU is not freeing memory as it should be.

@charleskawczynski charleskawczynski added this pull request to the merge queue Jan 29, 2025
@szy21
Copy link
Member

szy21 commented Jan 29, 2025

It passed everything in the merge queue, but somehow the format check didn't start.

@Sbozzolo
Copy link
Member Author

It passed everything in the merge queue, but somehow the format check didn't start.

I have an idea of what it might be. Let me start by merging this manually

@Sbozzolo Sbozzolo removed this pull request from the merge queue due to a manual request Jan 29, 2025
@Sbozzolo Sbozzolo merged commit 6cdeb57 into main Jan 29, 2025
18 of 20 checks passed
@Sbozzolo Sbozzolo deleted the gb/manifests branch January 29, 2025 18:55
@juliasloan25
Copy link
Member

@Sbozzolo why is there a Manifest in the .buildkite/ directory but no Project?

@charleskawczynski
Copy link
Member

@Sbozzolo why is there a Manifest in the .buildkite/ directory but no Project?

It was probably deleted by accident. We do need one becuase:

  • We'll need compats
  • users running instantiates will add .buildkite/Project.toml

I'll open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants