Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI with Julia v1.11 #3836

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

CI with Julia v1.11 #3836

wants to merge 36 commits into from

Conversation

navidcy
Copy link
Collaborator

@navidcy navidcy commented Oct 8, 2024

This PR switches the CI to use Julia v1.11.
It also adds a Manifest with v1.11 ending so that there is still compatibility with previous versions.

Note the the distributed CI still does not have Julia v1.11 (right @Sbozzolo?) so there Julia v1.10 is used. This is possible because there are two Manifests.

@navidcy navidcy added testing 🧪 Tests get priority in case of emergency evacuation package 📦 Quite meta labels Oct 8, 2024
@Sbozzolo
Copy link
Member

Sbozzolo commented Oct 8, 2024

Note the the distributed CI still does not have Julia v1.11 (right @Sbozzolo?) so there Julia v1.10 is used. This is possible because there are two Manifests.

I installed julia 1.11 on the Caltech cluster, but we haven't made a module yet (but it's coming today or so)

@glwagner
Copy link
Member

glwagner commented Oct 9, 2024

This is possible because there are two Manifests.

There's only one manifest?

@ali-ramadhan
Copy link
Member

This is possible because there are two Manifests.

There's only one manifest?

I assume he's referring to this new feature of Julia 1.11: https://julialang.org/blog/2024/10/julia-1.11-highlights/#manifest_versioning

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 9, 2024

There's only one manifest?

The PR adds another Manifest specifically for v1.11 and keeps the older Manifest that works for v1.10. Or we can just have one Manifest (the one for v1.11) and drop the one for v1.10

@glwagner
Copy link
Member

glwagner commented Oct 9, 2024

keeps the older Manifest that works for v1.10

The Manifest was deleted in #3783

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 9, 2024

oh great! I missed that!
I wanted to delete it long now but I got resistance doing that before... Great! So I'll delete the new Manifest then as well!

@glwagner
Copy link
Member

glwagner commented Oct 9, 2024

oh great! I missed that! I wanted to delete it long now but I got resistance doing that before... Great! So I'll delete the new Manifest then as well!

Deleting it seemed to help increase the likelihood that CI passed. Although, it did not fully solve the problem (and note a few other changes were also made on #3783).

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 11, 2024

Noting that internal_tide.jl gives NaN with Julia v1.11 while all is OK with Julia v1.10; something with immersed boundaries....? I'm looking into it.

@simone-silvestri
Copy link
Collaborator

I think it's a plotting issue. We are filling up the immersed boundaries with NaN and, apparently, we cannot plot NaNs anymore? The error says:

ERROR: LoadError: On worker 2:
  | Looking up a non-finite or NaN value in a colormap is undefined.

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 11, 2024

I ran the script and the actual simulation NaN-ed.

@glwagner
Copy link
Member

I ran the script and the actual simulation NaN-ed.

That means Oceanangians isn't compatible with julia 1.11.

Do any other tests catch the issue? We can use this opportunity to add more tests.

@navidcy
Copy link
Collaborator Author

navidcy commented Oct 11, 2024

I’m trying to make an mwe

@navidcy
Copy link
Collaborator Author

navidcy commented Jan 9, 2025

The random number generator doesn’t guarantee the same results even with a seed across versions

Good point! 👍🏼 Good to keep that in mind!

But the differences we see in the internal_tide.jl example shouldn't be due to random number generator.

@glwagner
Copy link
Member

glwagner commented Jan 9, 2025

One possibility is that syntax changed for something in a subtle way, so the code still runs but some function is being called incorrectly. Not sure what that could be though

@glwagner
Copy link
Member

glwagner commented Jan 9, 2025

Here a list of changes:

https://docs.julialang.org/en/v1/NEWS/#:~:text=New%20language%20features,-New%20Memory%20type&text=Most%20of%20the%20Array%20type,public%20are%20considered%20public%20API.

the change to zero stands out the most to me since that is called extensively; then again I don't think the described feature applies to us.

We should check if there are changes only on CPU, or on both CPU and GPU. @ali-ramadhan I'm assuming your test was on CPU.

@navidcy
Copy link
Collaborator Author

navidcy commented Jan 9, 2025

Good find. Is there a way to redefine/import zero(::AbstractArray) to be how it was in Julia v1.10 to check if problem goes away? That would give us a very good hint!

@glwagner
Copy link
Member

glwagner commented Jan 9, 2025

Good find. Is there a way to redefine/import zero(::AbstractArray) to be how it was in Julia v1.10 to check if problem goes away? That would give us a very good hint!

I don't think we use zero(::AbstractArray). I was more wondering if changes related to that update might have affected us (which are not written in the notes).

@glwagner
Copy link
Member

I can confirm that Julia 1.11 is changing how the immersed boundary (and free surface?) works. Here's the difference (including halos) in the internal tide u field after 1 time step (difference is Julia 1.10 - Julia 1.11 with + = red). Color range is ±0.003 m/s so it's a pretty significant for just 1 time step.

Probably time to pull out Debugger.jl and step through a time step to find what causes the difference!

display

I can't reproduce this

@ali-ramadhan
Copy link
Member

ali-ramadhan commented Jan 15, 2025

@glwagner Does the internal_tide.jl example still blow up for you with Julia 1.11? And does the Julia 1.10 - Julia 1.11 difference show a different pattern or is there no difference?

I can try to reproduce on a different machine to confirm.

Debugger.jl probably won't get fixed soon so if I find some time this week I can step through with Julia 1.10 and just do some good old print debugging between the two versions.

@glwagner
Copy link
Member

In the test I did, the difference was identically 0. However, I just noticed that I was not using the latest main. I was on 0.95.5.

On 0.95.5, it looks like internal_tide.jl runs fine. This is what I get

internal_tide.mp4

I have an Mac M1 Max and using 1.11.2:

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_PRECOMPILE_TASKS = 6
  LD_LIBRARY_PATH = /Users/gregorywagner/Software/hdf5-1.14.2/lib:
  JULIA_EDITOR = vim

I'll redo these tests on the current main and also try to reproduce on tartarus...

@giordano
Copy link
Contributor

This does not seem to be a syntax issue but rather a numerical calculation issue.

Is there a small-ish reproducer to look at?

@simone-silvestri
Copy link
Collaborator

Also on my side, on my mac laptop, the internal_tide.jl example runs on julia 1.11.2 without crashing. The results are actually identical on julia 1.10.7 and 1.11.2 (on the current main branch, aka version 0.95.7)

internal_tide.mp4

@navidcy
Copy link
Collaborator Author

navidcy commented Jan 19, 2025

That sounds good! Let's see if the docs built now!

@glwagner
Copy link
Member

This does not seem to be a syntax issue but rather a numerical calculation issue.

Is there a small-ish reproducer to look at?

We need to make one and something relatively small should be possible. It looks like its not an issue on Mac so we have to try to reproduce the docs failure on the machine we use for CI. I can try in a few days.

@glwagner
Copy link
Member

@giordano
Copy link
Contributor

That's fixed by #4093.

@glwagner
Copy link
Member

glwagner commented Feb 16, 2025

Ok, looks like the error has progressed. Here's the CPU enzyme error now:

oceananigans_build_20621_cpu-enzyme-extension-tests.log

the top is

�_bk;t=1739585525736�Enzyme unit tests: �[91m�[1mError During Test�[22m�[39m at �[39m�[1m/var/lib/buildkite-agent/builds/tartarus-2/clima/oceananigans/test/test_enzyme.jl:71�[22m

�_bk;t=1739585525927�  Got exception outside of a @test

�_bk;t=1739585525936�  Enzyme compilation failed due to an internal error.

�_bk;t=1739585525936�   Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl

�_bk;t=1739585525936�   To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)

�_bk;t=1739585525936�  Current scope: 

�_bk;t=1739585525936�  ; Function Attrs: mustprogress willreturn

�_bk;t=1739585525936�  define internal fastcc noalias nonnull "enzyme_type"="{[-1]:Pointer}" { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* } @fakeaugmented_julia_Field_49272([3 x {} addrspace(10)*] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,16]:Pointer}" "enzymejl_parmtype"="140375159748096" "enzymejl_parmtype_ref"="1" %0, { i64, i64, i64, i64, i64, i64, double, double, double, double, double, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, double, double, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, double, double } } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(456) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer, [-1,40]:Integer, [-1,41]:Integer, [-1,42]:Integer, [-1,43]:Integer, [-1,44]:Integer, [-1,45]:Integer, [-1,46]:Integer, [-1,47]:Integer, [-1,48]:Float@double, [-1,56]:Float@double, [-1,64]:Float@double, [-1,72]:Float@double, [-1,80]:Float@double, [-1,88]:Float@double, [-1,96]:Float@double, [-1,104]:Float@double, [-1,112]:Float@double, [-1,120]:Integer, [-1,121]:Integer, [-1,122]:Integer, [-1,123]:Integer, [-1,124]:Integer, [-1,125]:Integer, [-1,126]:Integer, [-1,127]:Integer, [-1,128]:Integer, [-1,129]:Integer, [-1,130]:Integer, [-1,131]:Integer, [-1,132]:Integer, [-1,133]:Integer, [-1,134]:Integer, [-1,135]:Integer, [-1,136]:Integer, [-1,137]:Integer, [-1,138]:Integer, [-1,139]:Integer, [-1,140]:Integer, [-1,141]:Integer, [-1,142]:Integer, [-1,143]:Integer, [-1,144]:Float@double, [-1,152]:Float@double, [-1,160]:Float@double, [-1,168]:Float@double, [-1,176]:Integer, [-1,177]:Integer, [-1,178]:Integer, [-1,179]:Integer, [-1,180]:Integer, [-1,181]:Integer, [-1,182]:Integer, [-1,183]:Integer, [-1,184]:Integer, [-1,185]:Integer, [-1,186]:Integer, [-1,187]:Integer, [-1,188]:Integer, [-1,189]:Integer, [-1,190]:Integer, [-1,191]:Integer, [-1,192]:Integer, [-1,193]:Integer, [-1,194]:Integer, [-1,195]:Integer, [-1,196]:Integer, [-1,197]:Integer, [-1,198]:Integer, [-1,199]:Integer, [-1,200]:Float@double, [-1,208]:Float@double, [-1,216]:Float@double, [-1,224]:Float@double, [-1,232]:Float@double, [-1,240]:Float@double, [-1,248]:Integer, [-1,249]:Integer, [-1,250]:Integer, [-1,251]:Integer, [-1,252]:Integer, [-1,253]:Integer, [-1,254]:Integer, [-1,255]:Integer, [-1,256]:Integer, [-1,257]:Integer, [-1,258]:Integer, [-1,259]:Integer, [-1,260]:Integer, [-1,261]:Integer, [-1,262]:Integer, [-1,263]:Integer, [-1,264]:Integer, [-1,265]:Integer, [-1,266]:Integer, [-1,267]:Integer, [-1,268]:Integer, [-1,269]:Integer, [-1,270]:Integer, [-1,271]:Integer, [-1,272]:Float@double, [-1,280]:Float@double, [-1,288]:Float@double, [-1,296]:Float@double, [-1,304]:Integer, [-1,305]:Integer, [-1,306]:Integer, [-1,307]:Integer, [-1,308]:Integer, [-1,309]:Integer, [-1,310]:Integer, [-1,311]:Integer, [-1,312]:Integer, [-1,313]:Integer, [-1,314]:Integer, [-1,315]:Integer, [-1,316]:Integer, [-1,317]:Integer, [-1,318]:Integer, [-1,319]:Integer, [-1,320]:Integer, [-1,321]:Integer, [-1,322]:Integer, [-1,323]:Integer, [-1,324]:Integer, [-1,325]:Integer, [-1,326]:Integer, [-1,327]:Integer, [-1,328]:Float@double, [-1,336]:Float@double, [-1,344]:Float@double, [-1,352]:Float@double, [-1,360]:Integer, [-1,361]:Integer, [-1,362]:Integer, [-1,363]:Integer, [-1,364]:Integer, [-1,365]:Integer, [-1,366]:Integer, [-1,367]:Integer, [-1,368]:Integer, [-1,369]:Integer, [-1,370]:Integer, [-1,371]:Integer, [-1,372]:Integer, [-1,373]:Integer, [-1,374]:Integer, [-1,375]:Integer, [-1,376]:Integer, [-1,377]:Integer, [-1,378]:Integer, [-1,379]:Integer, [-1,380]:Integer, [-1,381]:Integer, [-1,382]:Integer, [-1,383]:Integer, [-1,384]:Float@double, [-1,392]:Float@double, [-1,400]:Float@double, [-1,408]:Float@double, [-1,416]:Integer, [-1,417]:Integer, [-1,418]:Integer, [-1,419]:Integer, [-1,420]:Integer, [-1,421]:Integer, [-1,422]:Integer, [-1,423]:Integer, [-1,424]:Integer, [-1,425]:Integer, [-1,426]:Integer, [-1,427]:Integer, [-1,428]:Integer, [-1,429]:Integer, [-1,430]:Integer, [-1,431]:Integer, [-1,432]:Integer, [-1,433]:Integer, [-1,434]:Integer, [-1,435]:Integer, [-1,436]:Integer, [-1,437]:Integer, [-1,438]:Integer, [-1,439]:Integer, [-1,440]:Float@double, [-1,448]:Float@double}" "enzymejl_parmtype"="140367625533648" "enzymejl_parmtype_ref"="1" %1, { i64, i64, i64, i64, i64, i64, double, double, double, double, double, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, double, double, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, { { [2 x double], [2 x double], i64, i64 }, [1 x i64] }, double, double } } addrspace(11)* nocapture nofree align 8 "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer, [-1,40]:Integer, [-1,41]:Integer, [-1,42]:Integer, [-1,43]:Integer, [-1,44]:Integer, [-1,45]:Integer, [-1,46]:Integer, [-1,47]:Integer, [-1,48]:Float@double, [-1,56]:Float@double, [-1,64]:Float@double, [-1,72]:Float@double, [-1,80]:Float@double, [-1,88]:Float@double, [-1,96]:Float@double, [-1,104]:Float@double, [-1,112]:Float@double, [-1,120]:Integer, [-1,121]:Integer, [-1,122]:Integer, [-1,123]:Integer, [-1,124]:Integer, [-1,125]:Integer, 

@navidcy
Copy link
Collaborator Author

navidcy commented Feb 18, 2025

I suggest we put a compat entry for Julia 1.10. Otherwise people use Julia v1.11 and run into problems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package 📦 Quite meta testing 🧪 Tests get priority in case of emergency evacuation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants