bug: inflation creating out-of-bounds values qceff #749

hkershaw-brown · 2024-10-03T14:57:45Z

🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.

Describe the bug

List the steps someone needs to take to reproduce the bug.
Run CAM-FV reanalysis single run of filter example /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run
inf_flavor 2
What was the expected outcome?
Run to completion
What actually happened?

filter errors out:

ERROR FROM:
  source : bnrh_distribution_mod.f90
  routine: bnrh_cdf
  message:  Smallest ensemble member less than lower bound -4.009123520117420E-164  0.000000000000000E+000

Error Message

Please provide any error messages.

Which model(s) are you working with?

CAM-FV

Screenshots

Here are the lines of code:

DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90

Lines 559 to 560 in 464aa57

    
           sd_inflate = sqrt(inflate)  
        
           ens = ens * sd_inflate + mean * (1.0_r8 - sd_inflate)

Version of DART

Which version of DART are you using?
v11.8.1

Have you modified the DART code?

No

Here is a small reproducer

program test_maths

! Using selected_real_kind for double precision
integer, parameter :: dp = selected_real_kind(15, 307)
real(kind=dp) :: ens
real(kind=dp) :: mean
real(kind=dp) :: inflate  
real(kind=dp) :: sd_inflate

ens = 3.2342393548165711e-191_dp
inflate = 1.4082289753147326_dp
mean = 2.1474965713154784e-163_dp

print *, ens    

sd_inflate = sqrt(inflate) 
ens = ens * sd_inflate + mean * (1.0_dp - sd_inflate)

print *, ens    

end program test_maths

hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$  ifort fortran_maths.f90 
hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$ ./a.out 
  3.234239354816571E-191
 -4.009123520117420E-164

Or you can run CAM-FV filter
/glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run

The numbers for the test_maths.f90 program are from proc3 (out of 128) j==61138

Build information

Please describe:

Derecho
intel

also mac gfortran GNU Fortran (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0

[hkershaw:problem]() > gfortran fortran_maths.f90
[hkershaw:problem]() > ./a.out 
   3.2342393548165711E-191
  -4.0091235201174199E-164

The text was updated successfully, but these errors were encountered:

hkershaw-brown · 2024-10-03T15:00:20Z

there are some early returns in probit_transform_mod

DART/assimilation_code/modules/assimilation/probit_transform_mod.f90

Lines 350 to 353 in 464aa57

    
           ! Do not know what to do if sd of original ensemble is 0 (or small, work on this later) 
        
           if(get_bnrh_sd(p) <= 0.0_r8) then 
        
              ! Just return the original ensemble 
        
              probit_ens = state_ens

hkershaw-brown · 2024-10-03T15:02:45Z

I think this may be a reproducer for at least part of #681 and also why #709

hkershaw-brown · 2024-10-03T16:57:21Z

whole ensemble:

before line 559

"i","value"
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0
10,0
11,0
12,0
13,0
14,0
15,0
16,0
17,0
18,0
19,0
20,0
21,0
22,0
23,0
24,0
25,0
26,0
27,0
28,0
29,0
30,0
31,0
32,0
33,0
34,0
35,0
36,0
37,0
38,0
39,0
40,0
41,0
42,0
43,0
44,0
45,0
46,0
47,3.2342393548165711e-191
48,0
49,0
50,0
51,0
52,0
53,0
54,0
55,0
56,0
57,0
58,0
59,0
60,0
61,0
62,0
63,0
64,0
65,1.5319740054648927e-206
66,1.3760414660445314e-201
67,0
68,0
69,0
70,1.0275904858792872e-161
71,0
72,0
73,0
74,0
75,0
76,0
77,0
78,0
79,6.9040677117309552e-162
80,0

after inflation

"i","value"
1,-4.0091235201174199e-164
2,-4.0091235201174199e-164
3,-4.0091235201174199e-164
4,-4.0091235201174199e-164
5,-4.0091235201174199e-164
6,-4.0091235201174199e-164
7,-4.0091235201174199e-164
8,-4.0091235201174199e-164
9,-4.0091235201174199e-164
10,-4.0091235201174199e-164
11,-4.0091235201174199e-164
12,-4.0091235201174199e-164
13,-4.0091235201174199e-164
14,-4.0091235201174199e-164
15,-4.0091235201174199e-164
16,-4.0091235201174199e-164
17,-4.0091235201174199e-164
18,-4.0091235201174199e-164
19,-4.0091235201174199e-164
20,-4.0091235201174199e-164
21,-4.0091235201174199e-164
22,-4.0091235201174199e-164
23,-4.0091235201174199e-164
24,-4.0091235201174199e-164
25,-4.0091235201174199e-164
26,-4.0091235201174199e-164
27,-4.0091235201174199e-164
28,-4.0091235201174199e-164
29,-4.0091235201174199e-164
30,-4.0091235201174199e-164
31,-4.0091235201174199e-164
32,-4.0091235201174199e-164
33,-4.0091235201174199e-164
34,-4.0091235201174199e-164
35,-4.0091235201174199e-164
36,-4.0091235201174199e-164
37,-4.0091235201174199e-164
38,-4.0091235201174199e-164
39,-4.0091235201174199e-164
40,-4.0091235201174199e-164
41,-4.0091235201174199e-164
42,-4.0091235201174199e-164
43,-4.0091235201174199e-164
44,-4.0091235201174199e-164
45,-4.0091235201174199e-164
46,-4.0091235201174199e-164
47,-4.0091235201174199e-164
48,-4.0091235201174199e-164
49,-4.0091235201174199e-164
50,-4.0091235201174199e-164
51,-4.0091235201174199e-164
52,-4.0091235201174199e-164
53,-4.0091235201174199e-164
54,-4.0091235201174199e-164
55,-4.0091235201174199e-164
56,-4.0091235201174199e-164
57,-4.0091235201174199e-164
58,-4.0091235201174199e-164
59,-4.0091235201174199e-164
60,-4.0091235201174199e-164
61,-4.0091235201174199e-164
62,-4.0091235201174199e-164
63,-4.0091235201174199e-164
64,-4.0091235201174199e-164
65,-4.0091235201174199e-164
66,-4.0091235201174199e-164
67,-4.0091235201174199e-164
68,-4.0091235201174199e-164
69,-4.0091235201174199e-164
70,1.215420420032912e-161
71,-4.0091235201174199e-164
72,-4.0091235201174199e-164
73,-4.0091235201174199e-164
74,-4.0091235201174199e-164
75,-4.0091235201174199e-164
76,-4.0091235201174199e-164
77,-4.0091235201174199e-164
78,-4.0091235201174199e-164
79,8.1528847158862943e-162
80,-4.0091235201174199e-164

hkershaw-brown · 2024-10-03T18:25:25Z

from_probit_bounded_normal_rh state_ens gets set to probit_ens

not sure why the more_params(7) tail_sd_eft is negative? get_bnrh_sd(p)

DART/assimilation_code/modules/assimilation/bnrh_distribution_mod.f90

Lines 91 to 100 in 464aa57

    
           ! Get ensemble mean and sd 
        
           call normal_mean_sd(x, ens_size, mean, tail_sd_left) 
        
           tail_sd_right = tail_sd_left 
        
           ! Don't know what to do if sd is 0; tail_sds returned are illegal value to indicate this 
        
           if(tail_sd_left <= 0.0_r8) then 
        
              tail_sd_left = -99_r8 
        
              tail_sd_right = -99_r8  
        
              return 
        
           endif

hkershaw-brown · 2024-10-03T20:00:41Z

The sd == 0 so you never transform into (or back out of) probit space.

Inflation pushes the
I think in this case
ens = (ens - mean) * sqrt(inflate) + mean
if your ens(i) == mean then you can not inflate, but this is only going to work for multiplicative inflation. If you're adding noise or whatever various inflation methods do, I think you will have to enforce the bounds. Is this ok? clamping rather than qceff enforced bounds.

 555       ! Spread the ensemble out linearly for deterministic
 556       sd_inflate = sqrt(inflate)
 557       ! HK if ens == mean do nothing
 558       where ((ens - mean) > epsilon(mean) )
 559          ! Following line can lead to inflation of 1.0 changing ens on some compilers
 560          !!! ens = (ens - mean) * sqrt(inflate) + mean
 561          ! Following gives 1.0 inflation having no impact on known compilers
 562          ens = ens * sd_inflate + mean * (1.0_r8 - sd_inflate)
 563       endwhere

hkershaw-brown · 2024-11-08T18:37:46Z

note there is issue #748 which is a different inflation QCEFF bug which looks like it may create out-of-bounds values also.

jlaucar · 2024-12-04T23:18:19Z

This bug impacted the application of inflation, but it was potentially much more insidious. As noted by Helen, the problem occured when the bnrh transform was unable to complete because the ensemble standard deviation was not positive. This same problem can arise for all of the other places in assim_tools_mod.f90 where transforms are made. The most straightforward solution begins by having an error return from the probit transform. Only the bnrh can fail so all other types of distributions always complete and do not need to be considered. The original code design assumed that if the standard deviation was not positive, that all ensemble members must have the same value. However, as noted in Helen's examples, if the differences between ensemble members are very small, computational precision limits can result in a computed standard deviation that is not positive. The assumption for the inflation and the other parts of assim_tools was that any distribution with all members identical could not be changed so it didn't matter if the failure of the transform was just ignored.

Note that the numerical analysis is very complex for the case where the computed SD for the bnrh transform is very small but non-zero. A number of tests suggest that this does not have inappropriate results, but I would not be shocked if something bad could happen in unusual cases.

jlaucar · 2024-12-23T14:52:26Z

Moha and Jeff did a code review of the proposed bug fix which led to one additional clarification to the code. The Lorenz_06_tracer tests seem to be okay. This version has been pushed and is now ready for testing with the CAM reanalysis code by Kevin.

edit: change is 1f2b7e1

hkershaw-brown · 2024-12-23T15:08:34Z

note for performance later, to_probit_bounded_normal_rh returns the original ensemble if the transform fails:

DART/assimilation_code/modules/assimilation/probit_transform_mod.f90

Lines 382 to 389 in 1f2b7e1

    
           ! Fail if sd is not positive (or small, work on this later) 
        
           if(get_bnrh_sd(p) <= 0.0_r8) then 
        
              ierr = 1 
        
              ! Just return the original ensemble 
        
              probit_ens = state_ens  
        
              ! Free up the storage that would have been used for the transformed variables 
        
              call deallocate_distribution_params(p) 
        
              return

check on ierr, and set again here: not needed?

DART/assimilation_code/modules/assimilation/probit_transform_mod.f90

Lines 91 to 100 in 1f2b7e1

    
           do i = 1, num_vars 
        
              call transform_to_probit(ens_size, state_ens(1:ens_size, i), distribution_type(i), & 
        
                 p(i), temp_ens, use_input_p, bounded_below, bounded_above, lower_bound, upper_bound, ierr(i)) 
        
              if(ierr(i) == 0)  
        
                 probit_ens(1:ens_size, i) = temp_ens 
        
              else 
        
                 ! If transform failed, return the input state 
        
                 probit_ens(1:ens_size, i) = state_ens(1:ens_size, i) 
        
              endif 
        
           end do

hkershaw-brown added the QCEFF quantile conserving filters label Oct 3, 2024

hkershaw-brown mentioned this issue Oct 31, 2024

bug: fix bounds fix_bound_violations = .true. seems to be required for ifort #709

Open

hkershaw-brown added the Bug Something isn't working label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: inflation creating out-of-bounds values qceff #749

bug: inflation creating out-of-bounds values qceff #749

hkershaw-brown commented Oct 3, 2024 •

edited

Loading

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Nov 8, 2024

jlaucar commented Dec 4, 2024

jlaucar commented Dec 23, 2024 •

edited by hkershaw-brown

Loading

hkershaw-brown commented Dec 23, 2024

bug: inflation creating out-of-bounds values qceff #749

bug: inflation creating out-of-bounds values qceff #749

Comments

hkershaw-brown commented Oct 3, 2024 • edited Loading

Describe the bug

Error Message

Which model(s) are you working with?

Screenshots

Version of DART

Have you modified the DART code?

Build information

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Oct 3, 2024

hkershaw-brown commented Nov 8, 2024

jlaucar commented Dec 4, 2024

jlaucar commented Dec 23, 2024 • edited by hkershaw-brown Loading

hkershaw-brown commented Dec 23, 2024

hkershaw-brown commented Oct 3, 2024 •

edited

Loading

jlaucar commented Dec 23, 2024 •

edited by hkershaw-brown

Loading