Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] apparent rarely occurring infinite loop at high energies for certain CPR rootfinder parameters with the combined Kr-C-Morse potential #224

Closed
drobnyjt opened this issue Sep 19, 2023 · 6 comments
Labels
question Further information is requested

Comments

@drobnyjt
Copy link
Collaborator

drobnyjt commented Sep 19, 2023

There seems to be a rare event at high energies (>10 keV) for the Kr-C-Morse combined potential that causes an infinite loop. I suspect a value is going to +/-INF and is not being caught by a NaN check, but more investigation is required.

The occurrence rate is somewhere between 1/250000 ions and 1/10000 ions, but any numerical situation that causes an infinite loop should instead panic. The question is: 1) where is the issue occurring? 2) can it be ameliorated with changes to the CPR rootfinder parameters? 3) what changes should be made to the code?

Example input file that exhibits this issue:

    [options]
    name = "krc_morse_23_default_Es_1_5"
    track_recoils = false
    weak_collision_order = 0
    electronic_stopping_mode = "LOW_ENERGY_NONLOCAL"
    mean_free_path_model = "LIQUID"
    interaction_potential = [[{"KRC_MORSE"={D=6.758838e-20, r0=2.7820000000000003e-10, alpha=14197999999.999998, k=70000000000.0, x0=7.5e-11}}]]
    scattering_integral = [["GAUSS_LEGENDRE"]]
    root_finder = [[{"CPR"={n0=2, nmax=100, epsilon=1E-7, complex_threshold=1E-6, truncation_threshold=1E-9, far_from_zero=1E9, interval_limit=1E-12, derivative_free=true}}]]
    num_threads = 4
    num_chunks = 1

    [particle_parameters]
    length_unit = "ANGSTROM"
    energy_unit = "EV"
    mass_unit = "AMU"
    N = [ 10000 ]
    m = [ 1.008 ]
    Z = [ 1 ]
    E = [ 6189.65818891261 ]
    Ec = [ 0.1 ]
    Es = [ 1.5 ]
    interaction_index = [ 0 ]
    pos = [ [ -4.4, 0.0, 0.0,] ]
    dir = [ [ 0.9999999999984769, 1.7453292519934434e-6, 0.0,] ]

    [geometry_input]
    length_unit = "ANGSTROM"
    electronic_stopping_correction_factor = 1.09
    densities = [ 0.0914 ]

    [material_parameters]
    energy_unit = "EV"
    mass_unit = "AMU"
    Eb = [ 0.0 ]
    Es = [ 5.61 ]
    Ec = [ 3.0 ]
    Z = [ 28 ]
    m = [ 58.69 ]
    interaction_index = [ 0 ]
    surface_binding_model = {"PLANAR"={calculation="INDIVIDUAL"}}
    bulk_binding_model = "AVERAGE"
    
@drobnyjt drobnyjt added the question Further information is requested label Sep 19, 2023
@drobnyjt
Copy link
Collaborator Author

drobnyjt commented Sep 19, 2023

Potential cause: if the CPR rootfinder truncation threshold is too high, there may be an infinite loop caused by truncated terms leading to identical Chebyshev polynomials each iteration and the error not decreasing as n increases, leading to infinite subdivision - although it should hit either nmax or the subdivision interval limit first. To test this, reducing those two parameters should be tried to see if it's not actually an infinite loop, just a very, very long loop caused by many large matrix inversions.

@drobnyjt
Copy link
Collaborator Author

Interestingly, I cannot reproduce this on a different machine. More investigation is needed.

@drobnyjt
Copy link
Collaborator Author

drobnyjt commented Jan 17, 2024

On the original machine with this issue, it appears to be limited to high energy, low nmax, and very small interval limits.

@drobnyjt
Copy link
Collaborator Author

drobnyjt commented Jan 30, 2025

This issue has appeared again, running test_morse.py on multiple machines with a very high number of computational ions.

I've been able to trace it to this issue to this nalgebra issue: dimforge/nalgebra#611

Essentially, the eigenvalue algorithm in nalgebra does not cover as many special cases as the LAPACK backend that I was using previously, so there is a class of matrices for which the algorithm cannot converge (which apparently includes some of the Chebyhsev-Frobenius matrices used in the DOCA problem). There does not seem to be any fix on the horizon for nalgebra due to lack of developer time.

At the very least, the infinite loop is addressable but this only turns the problem from an infinite loop to a panic.

Further investigation of potential solutions is needed.

@drobnyjt
Copy link
Collaborator Author

drobnyjt commented Jan 31, 2025

I've pushed an update to rcpr that appears to fix the issue.

When convergence failure of the Schur decomposition is detected, the find_roots function now simply splits the current interval in two and re-attempts to find roots on the resulting sub-intervals. As far as I can tell, one interval-splitting solves the issue completely on test_morse.py, but it has the ability to repeat that step as many times as necessary (until you overflow the stack).

If you are a RustBCA user experiencing this issue, update the rcpr line of Cargo.toml to the following for now:
rcpr = { git = "https://github.com/drobnyjt/rcpr", optional = true, branch="schur_decomp_fix"}

However, further testing is probably warranted going forward to make sure there aren't any physically reasonable interaction potentials that cause further problems with the rootfinder.

@drobnyjt
Copy link
Collaborator Author

drobnyjt commented Feb 6, 2025

This fix has been stable on the dev branch in all my testing - I'm closing this for now, and can reopen if someone discovers an edge case that causes a similar issue.

@drobnyjt drobnyjt closed this as completed Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant