Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Julia interface to support both Int32 and Int64 #358

Merged
merged 13 commits into from
Jan 4, 2025

Conversation

amontoison
Copy link
Member

@nimgould @jfowkes
I’ve been working on the Julia interface GALAHAD.jl to make it more generic and capable of handling various integer types (Int32, Int64).
All the code has been updated to support double dispatch, depending on both the precision (Float32, Float64, Float128) and the integer type.

For example, this implementation can handle all six combinations and will automatically load the appropriate shared library (from the six available) for the user.
Nick, I may need your help to update the documentation, but I believe this will be the last major modification to GALAHAD.jl.
It should be stable after this update.

function test_ugo(::Type{T}, ::Type{INT}) where {T,INT}
  # Test problem objective
  function objf(x::T)
    a = 10.0
    res = x * x * cos(a * x)
    return res
  end

  # Test problem first derivative
  function gradf(x::T)
    a = 10.0
    res = -a * x * x * sin(a * x) + 2.0 * x * cos(a * x)
    return res
  end

  # Test problem second derivative
  function hessf(x::T)
    a = 10.0
    res = -a * a * x * x * cos(a * x) - 4.0 * a * x * sin(a * x) + 2.0 * cos(a * x)
    return res
  end

  # Derived types
  data = Ref{Ptr{Cvoid}}()
  control = Ref{ugo_control_type{T,INT}}()
  inform = Ref{ugo_inform_type{T,INT}}()

  # Initialize UGO
  status = Ref{INT}()
  eval_status = Ref{INT}()
  ugo_initialize(T, INT, data, control, status)

  # Set user-defined control options
  @reset control[].print_level = INT(1)

  # Test problem bounds
  x_l = Ref{T}(-1.0)
  x_u = Ref{T}(2.0)

  # Test problem objective, gradient, Hessian values
  x = Ref{T}(0.0)
  f = Ref{T}(objf(x[]))
  g = Ref{T}(gradf(x[]))
  h = Ref{T}(hessf(x[]))

  # import problem data
  ugo_import(T, INT, control, data, status, x_l, x_u)

  # Set for initial entry
  status[] = 1

  # Solve the problem: min f(x), x_l ≤ x ≤ x_u
  terminated = false
  while !terminated
    # Call UGO_solve
    ugo_solve_reverse(T, INT, data, status, eval_status, x, f, g, h)

    # Evaluate f(x) and its derivatives as required
    if (status[]  2)  # need objective
      f[] = objf(x[])
      if (status[]  3)  # need first derivative
        g[] = gradf(x[])
        if (status[]  4) # need second derivative
          h[] = hessf(x[])
        end
      end
    else  # the solution has been found (or an error has occured)
      terminated = true
    end
  end

  # Record solution information
  ugo_information(T, INT, data, inform, status)

  if inform[].status == 0
    @printf("%i evaluations. Optimal objective value = %5.2f status = %1i\n",
            inform[].f_eval, f[], inform[].status)
  else
    @printf("UGO_solve exit status = %1i\n", inform[].status)
  end

  # Delete internal workspace
  ugo_terminate(T, INT, data, control, inform)

  return 0
end

I still need to update CI such that we test the Julia interface with 64-bits integer.

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.40%. Comparing base (d285318) to head (6fa6db9).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #358   +/-   ##
=======================================
  Coverage   43.40%   43.40%           
=======================================
  Files         309      309           
  Lines      161612   161612           
  Branches    55897    55897           
=======================================
+ Hits        70143    70148    +5     
  Misses      73976    73976           
+ Partials    17493    17488    -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nimgould
Copy link
Contributor

nimgould commented Jan 2, 2025

Very good, and this was what I suggested we do a few months ago :)

The docs should be simple to change via sed. Essentially (I think!), all mention of Int32 should become INT, arguments "( T," should become "( T, INT," and the section

parametric real type T

Below, the symbol T refers to a parametric real type that may be Float32
(single precision), Float64 (double precision) or, if supported, Float128
(quadruple precision).

needs to change to say

parametric real type T and integer type INT

Below, the symbol T refers to a parametric real type that may be Float32
(single precision), Float64 (double precision) or, if supported, Float128
(quadruple precision). The symbol INT refers to a parametric integer type that may be Int32 (32-bits) or Int64 (64-bits).

(make sure that the ---- underline is precisely as long as the string above it!).

Since I will inevitably screw all of this up with sed, you can either do it yourself, or wait for me (I'm doing something else for a while now). If you do it, I'll preprocess it when you have committed it. All the relevant files are in the directory $GALAHAD/doc/Julia/rst-dir. The Int32 parts will be everywhere, but the parametric real type T is only in the xxx.rst files where xxx is a package name (these should be done last as otherwise the newly added Int32 will become INT). As you noticed, there are also some Int64s that need to stay as Int64, and not all packages have integer arguments.

@amontoison
Copy link
Member Author

amontoison commented Jan 2, 2025

I can wait Nick. I will not do a new release of the Julia interface until a new release of GALAHAD (5.1.0) that also bring support for quadruple precision.

I also don't know how you extract the Julia structures for the documentation. Do you have a script to do that or is it done by hand?
Example: https://ralna.github.io/galahad_docs/html/Julia/tru.html#tru-control-type-structure

@nimgould Is it normal that we don't a suffix _64 for the routines of the C interface?
https://github.com/ralna/GALAHAD/blob/master/include/galahad_cfunctions.h

@nimgould
Copy link
Contributor

nimgould commented Jan 2, 2025

OK, I'll do it at some stage. The Julia and Python docs evolved by script from the C ones, but of course the Julia and C ones have drifted apart. Any future changes will be by hand, I am afraid.

I suppose we should have _64 for the C packages, but I have run out of energy and will leave this for one of you two. In addition, the C docs need updating as they don't mention the _s and _q name changes either. What happens with the Julia calls of the C routines, do they distinguish the appropriate C somehow for 32 and 64 bit? This is a large amount of work, I suspect :)

@amontoison
Copy link
Member Author

amontoison commented Jan 2, 2025

I have this segmentation fault in SBLS + Int64:

[28605] signal 11 (1): Segmentation fault
in expression starting at /home/runner/work/GALAHAD/GALAHAD/GALAHAD.jl/test/test_sbls.jl:189
sls_solve_ir at /home/runner/work/GALAHAD/GALAHAD/builddir/../src/sls/sls.F90:5672
sbls_solve_explicit at /home/runner/work/GALAHAD/GALAHAD/builddir/../src/sbls/sbls.F90:5397
sbls_solve at /home/runner/work/GALAHAD/GALAHAD/builddir/../src/sbls/sbls.F90:5143
sbls_solve_system at /home/runner/work/GALAHAD/GALAHAD/builddir/../src/sbls/sbls.F90:10267
sbls_solve_system_s at /home/runner/work/GALAHAD/GALAHAD/builddir/../src/sbls/C/sbls_ciface.F90:685
sbls_solve_system at /home/runner/work/GALAHAD/GALAHAD/GALAHAD.jl/src/wrappers/sbls.jl:363
test_sbls at /home/runner/work/GALAHAD/GALAHAD/GALAHAD.jl/test/test_sbls.jl:166

When I don't have a segmentation fault, the status is not 0:

C: SBLS_solve exit status = -11
R: SBLS_solve exit status = -11
D: SBLS_solve exit status = -11
L: SBLS_solve exit status = -11
S: SBLS_solve exit status = -11
I: SBLS_solve exit status = -11
Z: SBLS_solve exit status = -11

@amontoison amontoison force-pushed the julia_interface_upgrade branch from 3b94448 to 89b586e Compare January 2, 2025 22:01
@amontoison
Copy link
Member Author

amontoison commented Jan 2, 2025

OK, I'll do it at some point. The Julia and Python docs evolved from the C ones using a script, but of course, the Julia and C versions have diverged over time. Any future changes will have to be made manually, I'm afraid.

Let's hope this is the last major modification. 🤞
If we only update the documentation for new packages in the future, things should be fine.

I suppose we should have _64 for the C packages, but I’ve run out of energy and will leave this for one of you two. Additionally, the C docs need updating, as they don’t mention the _s and _q name changes either. What happens with the Julia calls to the C routines? Do they distinguish between the appropriate C libraries for 32-bit and 64-bit? This seems like a lot of work. :)

I added the _64 suffix to the C packages tonight (see PR #359).
If we update the C docs, we can explain all the variants at the same time (_s, _s_64, _64, _q, _q_64).

For the Julia calls, I can specify which symbol to call in a library, so it’s fine if some symbols overlap (as long as the shared libraries are not 'dlopened' globally, which Julia does).
Using Clang.jl, I can generate the Julia wrappers for double precision with Int32, then duplicate and adapt the wrappers for various combinations of floating-point and integer types.

Initially, I had the following:

function ugo_initialize(data, control, status)
  @ccall libgalahad_double.ugo_initialize(data::Ptr{Ptr{Cvoid}},
                                          control::Ptr{ugo_control_type},
                                          status::Ptr{Int32})::Cvoid
end

Next, I parameterized the structures and added two arguments to the function to support other variants:

function ugo_initialize(::Type{Float64}, ::Type{Int32}, data, control, status)
  @ccall libgalahad_double.ugo_initialize(data::Ptr{Ptr{Cvoid}},
                                          control::Ptr{ugo_control_type{Float64,Int32}},
                                          status::Ptr{Int32})::Cvoid
end

From this, I can generate as many variants as needed for different floating-point and integer types and rely on the multiple dispatch feature of Julia:

function ugo_initialize(::Type{Float32}, ::Type{Int32}, data, control, status)
  @ccall libgalahad_single.ugo_initialize_s(data::Ptr{Ptr{Cvoid}},
                                            control::Ptr{ugo_control_type{Float32,Int32}},
                                            status::Ptr{Int32})::Cvoid
end

function ugo_initialize(::Type{Float32}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_single_64.ugo_initialize_s(data::Ptr{Ptr{Cvoid}},
                                               control::Ptr{ugo_control_type{Float32,Int64}},
                                               status::Ptr{Int64})::Cvoid
end

function ugo_initialize(::Type{Float64}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_double_64.ugo_initialize(data::Ptr{Ptr{Cvoid}},
                                              control::Ptr{ugo_control_type{Float64,Int64}},
                                              status::Ptr{Int64})::Cvoid
end

function ugo_initialize(::Type{Float128}, ::Type{Int32}, data, control, status)
  @ccall libgalahad_quadruple.ugo_initialize_q(data::Ptr{Ptr{Cvoid}},
                                               control::Ptr{ugo_control_type{Float128,Int32}},
                                               status::Ptr{Int32})::Cvoid
end

function ugo_initialize(::Type{Float128}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_quadruple_64.ugo_initialize_q(data::Ptr{Ptr{Cvoid}},
                                                  control::Ptr{ugo_control_type{Float128,Int64}},
                                                  status::Ptr{Int64})::Cvoid
end

I regenerated the wrappers tonight to incorporate the new symbols, but everything is now automated. For instance:

function ugo_initialize(::Type{Float32}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_single_64.ugo_initialize_s_64(data::Ptr{Ptr{Cvoid}},
                                                  control::Ptr{ugo_control_type{Float32,Int64}},
                                                  status::Ptr{Int64})::Cvoid
end

function ugo_initialize(::Type{Float64}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_double_64.ugo_initialize_64(data::Ptr{Ptr{Cvoid}},
                                                control::Ptr{ugo_control_type{Float64,Int64}},
                                                status::Ptr{Int64})::Cvoid
end

function ugo_initialize(::Type{Float128}, ::Type{Int64}, data, control, status)
  @ccall libgalahad_quadruple_64.ugo_initialize_q_64(data::Ptr{Ptr{Cvoid}},
                                                     control::Ptr{ugo_control_type{Float128,Int64}},
                                                     status::Ptr{Int64})::Cvoid
end

The most challenging part was correctly parameterizing the structures, as they can depend on {T, INT}, {T}, {INT}, or none of these.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

The segfault seems to occur when julla is calling sbls_solve_system(T, data, status, n, m, sol) (at least here). The equivalent C tests set a non default solver
strcpy(control.symmetric_linear_solver, "sytr ") ;
strcpy(control.definite_linear_solver, "sytr ") ;
before calling sbls_import, and this test works This is maybe the issue; the default solver is ssids which is known to be buggy

The C test works fine with the two lines above commented out, so I can only suppose that the Julia interface has an issue?

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

I am now concerned that we are appending *(*) to all the C functions, but not doing anything for the C structures they rely on. If I want to call both the single and double C functions of (e.g.) cqp, currently any mention of cqp_inform (etc) is ambiguous. I believe that when we originally provided the C interfaces, we only intended that a single instance would be available at one time, and the docimentation reflects this. To change things now to support multiple instances is a massive breaking change. We could do this in the future, but I am not sure now is the time. I vote that we actually disable the galahad_c.h include file for the time being (except for the spral_c_dgemm parts), since @amontoison seems to say that Julia doesn't need separate names. Or did we do this for any other reason (Python doesn't need it as we only support Python's float and int)?

@amontoison
Copy link
Member Author

The segfault seems to occur when julla is calling sbls_solve_system(T, data, status, n, m, sol) (at least here). The equivalent C tests set a non default solver strcpy(control.symmetric_linear_solver, "sytr ") ; strcpy(control.definite_linear_solver, "sytr ") ; before calling sbls_import, and this test works This is maybe the issue; the default solver is ssids which is known to be buggy

The C test works fine with the two lines above commented out, so I can only suppose that the Julia interface has an issue?

I will test with the linear solver sytr.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

OK. The segfault actually occurs in sls when scale factors are applied. These are calculated by HSL codes, so maybe the issue is there? As always, its hard for me to debug fortran issues from julia calls, I have to rely on the equivalent C tests which are supposed to have the same functionality.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

On the C side, I get

Compiling sblsct [ OK ]

Exhaustive test of C function interface to sbls
C sparse matrix indexing

basic tests of storage formats

C: residual = 2.2e-16 status = 0
R: residual = 2.2e-16 status = 0
D: residual = 2.2e-16 status = 0
L: residual = 4.4e-16 status = 0
S: residual = 1.1e-16 status = 0
I: residual = 0.0e+00 status = 0
Z: residual = 1.1e-16 status = 0

@amontoison
Copy link
Member Author

amontoison commented Jan 3, 2025

I am now concerned that we are appending *(*) to all the C functions, but not doing anything for the C structures they rely on. If I want to call both the single and double C functions of (e.g.) cqp, currently any mention of cqp_inform (etc) is ambiguous. I believe that when we originally provided the C interfaces, we only intended that a single instance would be available at one time, and the docimentation reflects this. To change things now to support multiple instances is a massive breaking change. We could do this in the future, but I am not sure now is the time. I vote that we actually disable the galahad_c.h include file for the time being (except for the spral_c_dgemm parts), since @amontoison seems to say that Julia doesn't need separate names. Or did we do this for any other reason (Python doesn't need it as we only support Python's float and int)?

With a different suffix, all the libraries of GALAHAD can be linked to the same external library.
For example, a C++ code can have a template function (similar to what we have in Julia) and link to libgalahad_single.so, ... libgalahad_quadruple_64.so without issue.

But you're right that I don't know what will happen if we include only once the header because the structure has the same name for all variants.

What did you do in hsl_subset when you merged the headers? Do you preprocess the name of the structures to ensure that multiple versions of the structure can coexist?

It's fine for me on the Julia side if we decide to disable galahad_cfunctions.h.

@amontoison
Copy link
Member Author

amontoison commented Jan 3, 2025

OK. The segfault actually occurs in sls when scale factors are applied. These are calculated by HSL codes, so maybe the issue is there? As always, its hard for me to debug fortran issues from julia calls, I have to rely on the equivalent C tests which are supposed to have the same functionality.

We use the dummy version of HSL for the tests with CI.
You probably rely on the official version locally.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

OK. The segfault actually occurs in sls when scale factors are applied. These are calculated by HSL codes, so maybe the issue is there? As always, its hard for me to debug fortran issues from julia calls, I have to rely on the equivalent C tests which are supposed to have the same functionality.

We use the dummy version if HSL for the tests with CI. You probably rely on the official version locally.

I'll try with the dummy HSL version to see what happens. Did it work for you with sytr?

@amontoison
Copy link
Member Author

I just tested with sytr and all tests passed:

julia> include("test/test_sbls.jl")
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   7.5e-08 status = 0
R: residual =   7.5e-08 status = 0
D: residual =   7.5e-08 status = 0
L: residual =   2.4e-07 status = 0
S: residual =   6.0e-08 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   1.2e-07 status = 0
Test Summary:            | Pass  Total  Time
SBLS -- Float32 -- Int32 |    1      1  1.4s
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   7.5e-08 status = 0
R: residual =   7.5e-08 status = 0
D: residual =   7.5e-08 status = 0
L: residual =   2.4e-07 status = 0
S: residual =   6.0e-08 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   1.2e-07 status = 0
Test Summary:            | Pass  Total  Time
SBLS -- Float32 -- Int64 |    1      1  0.8s
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   4.4e-16 status = 0
R: residual =   4.4e-16 status = 0
D: residual =   4.4e-16 status = 0
L: residual =   8.9e-16 status = 0
S: residual =   1.1e-16 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   1.1e-16 status = 0
Test Summary:            | Pass  Total  Time
SBLS -- Float64 -- Int32 |    1      1  1.0s
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   4.4e-16 status = 0
R: residual =   4.4e-16 status = 0
D: residual =   4.4e-16 status = 0
L: residual =   8.9e-16 status = 0
S: residual =   1.1e-16 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   1.1e-16 status = 0
Test Summary:            | Pass  Total  Time
SBLS -- Float64 -- Int64 |    1      1  0.7s
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   3.9e-34 status = 0
R: residual =   3.9e-34 status = 0
D: residual =   3.9e-34 status = 0
L: residual =   7.7e-34 status = 0
S: residual =   9.6e-35 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   9.6e-35 status = 0
Test Summary:             | Pass  Total  Time
SBLS -- Float128 -- Int32 |    1      1  1.1s
 Fortran sparse matrix indexing

 basic tests of storage formats

C: residual =   3.9e-34 status = 0
R: residual =   3.9e-34 status = 0
D: residual =   3.9e-34 status = 0
L: residual =   7.7e-34 status = 0
S: residual =   9.6e-35 status = 0
I: residual =   0.0e+00 status = 0
Z: residual =   9.6e-35 status = 0
Test Summary:             | Pass  Total  Time
SBLS -- Float128 -- Int64 |    1      1  0.7s

We probably have a bug in SSIDS with 64-bit integer.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

Could you do the same with ssids, please. All is fine here with or without HSL

@amontoison amontoison force-pushed the julia_interface_upgrade branch from 89b586e to 68af6be Compare January 3, 2025 12:10
@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

Oh, and also turn on control.print_level = 1

@amontoison
Copy link
Member Author

amontoison commented Jan 3, 2025

Could you do the same with ssids, please. All is fine here with or without HSL

Did you not forgot to compile it Int64?
The output with SSIDS:

julia> include("test/test_sbls.jl")
 Fortran sparse matrix indexing

 basic tests of storage formats


 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 10, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 10, 15
 time to form and factorize explicit preconditioner   0.01
C: residual =   2.4e-07 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 10, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 10, 15
 time to form and factorize explicit preconditioner   0.01
R: residual =   2.4e-07 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 15, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 15, 15
 time to form and factorize explicit preconditioner   0.01
D: residual =   7.5e-08 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 11, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 11, 15
 time to form and factorize explicit preconditioner   0.01
L: residual =   2.4e-07 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 11, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 11, 15
 time to form and factorize explicit preconditioner   0.01
S: residual =   6.0e-08 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 11, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 11, 15
 time to form and factorize explicit preconditioner   0.01
I: residual =   0.0e+00 status = 0

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = 0, ordering = 7
 K n = 5, nnz(prec,predicted factors) = 9, 15
 SLS: factorization complete: status = 0, pivoting = 1
 K nnz(prec,factors) = 9, 15
 time to form and factorize explicit preconditioner   0.01
Z: residual =   1.2e-07 status = 0
Test Summary:            | Pass  Total  Time
SBLS -- Float32 -- Int32 |    1      1  1.4s
 Fortran sparse matrix indexing

 basic tests of storage formats


 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = -26, ordering = 7
 time to form and factorize explicit preconditioner   0.00

[5351] signal 11 (1): Erreur de segmentation
in expression starting at /home/alexis/Bureau/git/GALAHAD/GALAHAD.jl/test/test_sbls.jl:194
sls_solve_ir at /home/alexis/Bureau/git/GALAHAD/builddir_int64/../src/sls/sls.F90:5672
sbls_solve_explicit at /home/alexis/Bureau/git/GALAHAD/builddir_int64/../src/sbls/sbls.F90:5397
sbls_solve at /home/alexis/Bureau/git/GALAHAD/builddir_int64/../src/sbls/sbls.F90:5143
sbls_solve_system at /home/alexis/Bureau/git/GALAHAD/builddir_int64/../src/sbls/sbls.F90:10267
sbls_solve_system_s_64 at /home/alexis/Bureau/git/GALAHAD/builddir_int64/../src/sbls/C/sbls_ciface.F90:685
sbls_solve_system at /home/alexis/Bureau/git/GALAHAD/GALAHAD.jl/src/wrappers/sbls.jl:370
test_sbls at /home/alexis/Bureau/git/GALAHAD/GALAHAD.jl/test/test_sbls.jl:171
unknown function (ip: 0x7bf580ffa56f)
macro expansion at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/Test/src/Test.jl:676 [inlined]
macro expansion at /home/alexis/Bureau/git/GALAHAD/GALAHAD.jl/test/test_sbls.jl:202 [inlined]
macro expansion at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/Test/src/Test.jl:1704 [inlined]
top-level scope at /home/alexis/Bureau/git/GALAHAD/GALAHAD.jl/test/test_sbls.jl:202
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
include_string at ./loading.jl:2734
_include at ./loading.jl:2794
include at ./sysimg.jl:38
unknown function (ip: 0x7bf580fdf372)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10104.1 at /home/alexis/julia/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14803.1 at /home/alexis/julia/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73406.1 at /home/alexis/julia/julia-1.11.2/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at julia (unknown line)
unknown function (ip: 0x7bf582429d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 5845798 (Pool: 5845558; Big: 240); GC: 7
Erreur de segmentation (core dumped)

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

No the tests I ran are are with int64

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

You get

SLS: analysis complete: status = -26, ordering = 7

while I have

SLS: analysis complete: status = 0, ordering = 7

This is why the factorization fails.

% galcode -26
GALAHAD status value -26 means:

Error return from GALAHAD -
the requested solver is not known

which means you don't have ssids compiled

@amontoison
Copy link
Member Author

I'm doing a new compilation in local but I checked the CI scripts and SSIDS is always compiled (Int32 or Int64).

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

(Of course this is an issue with the C (and Julia) tests, we should test the status value on return from the factorization is 0 before sol

@amontoison
Copy link
Member Author

We should not allow a linear solver that is not available :)

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

Can you put a check on status after you call sbls_factorize_matrix

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

We should not allow a linear solver that is not available :)

In principle, we don't. The package is supposed to check, and return with a status = -26 if the solver is absent. Is the true ssids compiled with int64 with your tests? If not, a dummy will have been substitued

@amontoison
Copy link
Member Author

We should not allow a linear solver that is not available :)

In principle, we don't. The package is supposed to check, and return with a status = -26 if the solver is absent. Is the true ssids compiled with int64 with your tests? If not, a dummy will have been substitued

I recompiled a local version and all tests passed with SSIDS.
I will try again with CI.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

I added

    // check that the factorization succeeded
    if(status != 0){
        sbls_information( &data, &inform, &status );
        printf("%c: SBLS_solve factorization exit status = %1" i_ipc_ "\n", 
               st, inform.status);
        continue;
    } 

to the C test programs, you might want to do the equivalent for the Julia?

@amontoison
Copy link
Member Author

@nimgould

C: SBLS_solve factorization exit status = -9

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

GALAHAD status value -9 means:

Error return from GALAHAD -
the analysis phase failed

please set print_level = 1 so that we get a better idea of which solver is used, etc

@amontoison
Copy link
Member Author

I am away from the computer this afternoon.
I will test it tonight.

I suspect an issue with 1-based indexing + Int64 in SSIDS. It's why it's not triggered by the C test.

@nimgould
Copy link
Contributor

nimgould commented Jan 3, 2025

Possibly ... but the interface is in fortran not C, the 0<->1 translation happens there. The C tests provided examine both 0 and 1 based.

No problem to be away from a computer, it is only natural!

@amontoison
Copy link
Member Author

amontoison commented Jan 3, 2025

 n = 3, m = 2
 preconditioner = 2, factorization = 2, solver = ssids
 augmented matrix used 
 preconditioner: G = H 

 Using SLS(ssids) to factorize the augmented matrix
 SLS: analysis complete: status = -1, ordering = 7
 time to form and factorize explicit preconditioner   0.00
C: SBLS_solve factorization exit status = -9

@nimgould
Copy link
Contributor

nimgould commented Jan 4, 2025

OK, next could you set the Julia equivalent of

control.sls_control.print_level = 1;

sls_analyse is returning status = -1 which means an array allocation error somewhere

@nimgould
Copy link
Contributor

nimgould commented Jan 4, 2025

I have updated the Julia rst docs, but need a commit of your Julia examples before I can build and release the final html on the galahad socs site

@amontoison amontoison merged commit a517cb9 into master Jan 4, 2025
13 checks passed
@amontoison amontoison deleted the julia_interface_upgrade branch January 4, 2025 20:54
@amontoison
Copy link
Member Author

amontoison commented Jan 4, 2025

I merged the PR Nick.
If you update the documentation of the Julia interface, the future release will be the version 0.4.0 of GALAHAD.jl.

For the issue with SBLS, I replaced the linear solver by "sytr" such that all tests passed here.

I created another PR where I only compile SBLS with relevant print_level to more easily isolate and fix the problem.

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

Thanks Alexis. I'll update the docs, and push very shortly

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

Good idea to replace ssids by sytr as we are only interested in how sbls manages not sls (where the issue occurs).

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

I did a virgin install of galahad (no extras added), and the sbls C tests all work with gcc-10 to 14. So I have no idea what is causing the issues you are seeing. As I said before, we will not make progress unless you add the control.sls_control.print_level = 1 to the tests with ssids enabled. I'm sorry this is such a pain!

The Windows issue you report is on the line that defines a character string depneding on the length of a previous one. The strings are suppoosed to have been preserved from call to call (they are inout) but maybe there is a bug in the windows compiler that didn't do so?

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

I checked here, and the character string control%prefix is of length 2. Thus
CHARACTER ( LEN = LEN( TRIM( control%prefix ) ) - 2 ) :: prefix
should be a character string on length 0, which is allowed by fortran (and -ve values are considered to be 0). But maybe the LLVM behind the fortran doesn't allow it? This seems like a (fixed) bug to me, particularly if gfortran-14 doesn't fail.

@amontoison
Copy link
Member Author

amontoison commented Jan 5, 2025

I did a virgin install of galahad (no extras added), and the sbls C tests all work with gcc-10 to 14. So I have no idea what is causing the issues you are seeing. As I said before, we will not make progress unless you add the control.sls_control.print_level = 1 to the tests with ssids enabled. I'm sorry this is such a pain!

I activated it but I don't have any additional information with control.sls_control.print_level = 1.

The Windows issue you report is on the line that defines a character string depending on the length of a previous one. The strings are supposed to have been preserved from call to call (they are inout) but maybe there is a bug in the windows compiler that didn't do so?

I will update my CI script to ensure that the C and Fortran structures are properly aligned. If the structures are misaligned by a few bits, it could cause failures in unpredictable ways, which can be challenging to debug. I plan to address this next week.

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

OK, the next thing to check is the value of inform.sls_inform.bad_alloc after the call. This should (!) tell us which array allocation failed

@nimgould
Copy link
Contributor

nimgould commented Jan 5, 2025

No hurry, by the way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants