Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect stage value for ERKs with c_s=1 but not FSAL #311

Closed
Steven-Roberts opened this issue Jul 28, 2023 · 11 comments
Closed

Incorrect stage value for ERKs with c_s=1 but not FSAL #311

Steven-Roberts opened this issue Jul 28, 2023 · 11 comments

Comments

@Steven-Roberts
Copy link
Collaborator

At the end of an ERK step, erkStep_FullRHS is called in ARK_FULLRHS_END mode to compute $f(y_{n+1})$. If the method has the first-same-as-last property (FSAL), there's no need to reevaluate $f$. The FSAL check

recomputeRHS = SUNFALSE;
if (SUNRabs(step_mem->B->c[step_mem->stages - 1] - ONE) > TINY)
recomputeRHS = SUNTRUE;

is incorrect, however. It needs to check that the last row of $A$ is $b^T$. Several methods have $c_s=1$ but are not FSAL including ARKODE_HEUN_EULER_2_1_2 and all the ARKODE_ARK* methods. A test with Heun's method confirms it's calling $f$ half as many times as it should. I am happy to make a PR fix, or should this be included in #293?

@Steven-Roberts
Copy link
Collaborator Author

@drreynolds
Copy link
Collaborator

I'm not sure that I understand what occurred. Was this always broken (or at least for a long time), or was this introduced recently? The reversion commit that you reference seems to be in relation to PR #293, but that hasn't yet been merged.

In my opinion, if this has been around for a while, then I think that @Steven-Roberts can put a fix into #293 then we can make it a priority to review/merge this promptly. If this bug was just introduced, however, then a bugfix release would be more appropriate.

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

@drreynolds The reversion commit Steven linked to is not linked to PR #293. It is linked to PR that was merged in BitBucket prior to our move to GitHub. The branch that was merged named feature/arkode-fullrhs-enum, a very similar name to PR #293.

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

Here is a link to the PR that made the change to just check c[s-1].

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

I do not understand why yet, but this seems to only be a problem in ERKStep, and not ARKStep. With ARKStep and ARKODE_HEUN_EULER_2_1_2 it uses the expected number of function calls.

Here is the check in ARKStep.

    /* determine if explicit/implicit RHS functions need to be recomputed */
    recomputeRHS = SUNFALSE;
    if ( step_mem->explicit && (SUNRabs(step_mem->Be->c[step_mem->stages-1]-ONE)>TINY) )
      recomputeRHS = SUNTRUE;
    if ( step_mem->implicit && (SUNRabs(step_mem->Bi->c[step_mem->stages-1]-ONE)>TINY) )
      recomputeRHS = SUNTRUE;

The check has always been this way in ARKStep (back to the initial ARKODE release, predating the ARKStep stepper module).

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

OK, so the reason ARKStep still makes the expected number of function evaluations is because it always starts the internal stage loop at the first stage, while ERKStep starts at the second stage.

ARKStep:

  for (is=0; is<step_mem->stages; is++) {

ERKStep:

  /* Loop over internal stages to the step; since the method is explicit
     the first stage RHS is just the full RHS from the start of the step */
  for (is=1; is<step_mem->stages; is++) {

As noted in the code, the stage loop in ERKStep is written this way since it is assuming that a call to the RHS was made "at the start of the step", which should be the same as the first stage RHS. In reality, the call "at the start of the step" actually happens at the end of the previous step via a call to the fullRHS with the ARK_FULLRHS_END case. The erroneous FSAL check results in the RHS not being called in ARK_FULLRHS_END, so ERKStep assumption is then, erroneously, violated.

With ARKStep, since the loop starts with the first stage and the assumption that RHS is the same as "the start of the step", the erroneous FSAL check does not impact it. However, the correct FSAL check would actually result in extra calls to the RHS by ARKStep.

OK now for testing...

First, without any modifications to the steppers except some print statements at every point where the RHS is called.

If I run the ark_advection_diffusion_reaction.cpp example with ARKStep I get,

          t                     ||y||_rms      
 ----------------------------------------------
 0.000000000000000e+00    4.029895706298216e+00
call fe, ARK_FULLRHS_START
call fe, takeStep
call fe, takeStep
 2.000000000000000e-06    4.029887539447282e+00
 4.000000000000001e-06    4.029879372649061e+00
 6.000000000000001e-06    4.029871205903555e+00
 8.000000000000001e-06    4.029863039210761e+00
call fe, takeStep
call fe, takeStep
 1.000000000000000e-05    4.029854872570688e+00
 1.200000000000000e-05    4.029846714040837e+00
 1.400000000000000e-05    4.029838555563598e+00
 1.600000000000001e-05    4.029830397138965e+00
 1.800000000000001e-05    4.029822238766947e+00
 2.000000000000000e-05    4.029814080447542e+00
 ----------------------------------------------

Final integrator statistics:
  Steps              = 2
  Step attempts      = 2
  Error test fails   = 0
  Explicit RHS evals = 5
  Implicit RHS evals = 0

Now with ERKStep

          t                     ||y||_rms      
 ----------------------------------------------
 0.000000000000000e+00    4.029895706298216e+00
call f, ARK_FULLRHS_START
call f, takeStep
 2.000000000000000e-06    4.029887539447282e+00
 4.000000000000001e-06    4.029879372649060e+00
 6.000000000000001e-06    4.029871205903555e+00
 8.000000000000001e-06    4.029863039210761e+00
call f, takeStep
 1.000000000000000e-05    4.029854872570688e+00
 1.200000000000000e-05    4.029846714043648e+00
 1.400000000000000e-05    4.029838555569221e+00
 1.600000000000001e-05    4.029830397147401e+00
 1.800000000000001e-05    4.029822238778189e+00
 2.000000000000000e-05    4.029814080461592e+00
 ----------------------------------------------

Final integrator statistics:
  Steps            = 2
  Step attempts    = 2
  Error test fails = 0
  RHS evals        = 3

Now if we correct the FSAL check in ERKStep,

          t                     ||y||_rms      
 ----------------------------------------------
 0.000000000000000e+00    4.029895706298216e+00
call f, ARK_FULLRHS_START
call f, takeStep
call f, recomputeRHS
 2.000000000000000e-06    4.029887539447282e+00
 4.000000000000001e-06    4.029879372649060e+00
 6.000000000000001e-06    4.029871205903555e+00
 8.000000000000001e-06    4.029863039210761e+00
call f, takeStep
call f, recomputeRHS
 1.000000000000000e-05    4.029854872570688e+00
 1.200000000000000e-05    4.029846714040837e+00
 1.400000000000000e-05    4.029838555563598e+00
 1.600000000000001e-05    4.029830397138965e+00
 1.800000000000001e-05    4.029822238766947e+00
 2.000000000000000e-05    4.029814080447542e+00
 ----------------------------------------------

Final integrator statistics:
  Steps            = 2
  Step attempts    = 2
  Error test fails = 0
  RHS evals        = 5

And finally, if we correct the FSAL check in ARKStep

          t                     ||y||_rms      
 ----------------------------------------------
 0.000000000000000e+00    4.029895706298216e+00
call fe, ARK_FULLRHS_START
call fe, takeStep
call fe, takeStep
call fe, recomputeRHS
 2.000000000000000e-06    4.029887539447282e+00
 4.000000000000001e-06    4.029879372649061e+00
 6.000000000000001e-06    4.029871205903555e+00
 8.000000000000001e-06    4.029863039210761e+00
call fe, takeStep
call fe, takeStep
call fe, recomputeRHS
 1.000000000000000e-05    4.029854872570688e+00
 1.200000000000000e-05    4.029846714040837e+00
 1.400000000000000e-05    4.029838555563598e+00
 1.600000000000001e-05    4.029830397138965e+00
 1.800000000000001e-05    4.029822238766947e+00
 2.000000000000000e-05    4.029814080447542e+00
 ----------------------------------------------

Final integrator statistics:
  Steps              = 2
  Step attempts      = 2
  Error test fails   = 0
  Explicit RHS evals = 7

@balos1 balos1 added this to the SUNDIALS Next milestone Jul 30, 2023
balos1 added a commit that referenced this issue Jul 30, 2023
@Steven-Roberts
Copy link
Collaborator Author

Was this always broken (or at least for a long time), or was this introduced recently?

The diff I linked suggests this was introduced about 2 years ago.

if this has been around for a while, then I think that @Steven-Roberts can put a fix into #293 then we can make it a priority to review/merge this promptly

Ok, I'll add a commit to #293 then

I had not checked ARKStep yet, so that's some helpful results, Cody. Although the loop start fixes the issue (at the cost of an extra function evaluation), I see a potential problem. For a non-FSAL ARK method with $c^E = c^I = 1$, the Hermite interpolant will use $f(Y_s)$. For most methods, this is only an order 1 approximation to $f(y(t_{n+1}))$, while $f(y_{n+1})$ is order $q$. I guess this is ok for low order interpolants, but for high order it's problematic. The fix would be similar to the fix for ERKStep.

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

@Steven-Roberts I am not suggesting that adjusting the stage loop start is a fix for the problem, rather that is how the code is right now in ARKStep. Fixing the FSAL check in ARKStep as you have suggested for ERKStep will result in the extra function call, so I am suggesting that some more thought needs to be put into how to handle this in ARKStep.

@Steven-Roberts
Copy link
Collaborator Author

Yes, I understand. The ARKStep FASL issue I mention is present in the current code. I agree we'll need to think carefully about removing the extra function eval while fixing these bugs.

@balos1
Copy link
Member

balos1 commented Jul 30, 2023

I started a bugfix branch. Right now, it just has the changes I made to product the test result I showed above. I think we should make this a separate PR from #293, at least, it should be a PR that aims to merge into #293. It is nice to have a single concern in a PR and this is a more notable bugfix.

@balos1 balos1 modified the milestones: SUNDIALS 6.6.1, SUNDIALS Next Sep 13, 2023
@balos1 balos1 removed this from the SUNDIALS 7.0.0 milestone Dec 6, 2023
@gardner48
Copy link
Member

Fixed in SUNDIALS 6.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants