Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: inflation creating out-of-bounds values qceff #749

Open
hkershaw-brown opened this issue Oct 3, 2024 · 9 comments
Open

bug: inflation creating out-of-bounds values qceff #749

hkershaw-brown opened this issue Oct 3, 2024 · 9 comments
Labels
Bug Something isn't working QCEFF quantile conserving filters

Comments

@hkershaw-brown
Copy link
Member

hkershaw-brown commented Oct 3, 2024

🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.

Describe the bug

  1. List the steps someone needs to take to reproduce the bug.
    Run CAM-FV reanalysis single run of filter example /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run
    inf_flavor 2

  2. What was the expected outcome?
    Run to completion

  3. What actually happened?

filter errors out:

ERROR FROM:
  source : bnrh_distribution_mod.f90
  routine: bnrh_cdf
  message:  Smallest ensemble member less than lower bound -4.009123520117420E-164  0.000000000000000E+000

Error Message

Please provide any error messages.

Which model(s) are you working with?

CAM-FV

Screenshots

Here are the lines of code:

sd_inflate = sqrt(inflate)
ens = ens * sd_inflate + mean * (1.0_r8 - sd_inflate)

Version of DART

Which version of DART are you using?
v11.8.1

Have you modified the DART code?

No

Here is a small reproducer

program test_maths

! Using selected_real_kind for double precision
integer, parameter :: dp = selected_real_kind(15, 307)
real(kind=dp) :: ens
real(kind=dp) :: mean
real(kind=dp) :: inflate  
real(kind=dp) :: sd_inflate

ens = 3.2342393548165711e-191_dp
inflate = 1.4082289753147326_dp
mean = 2.1474965713154784e-163_dp

print *, ens    

sd_inflate = sqrt(inflate) 
ens = ens * sd_inflate + mean * (1.0_dp - sd_inflate)

print *, ens    

end program test_maths  
hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$  ifort fortran_maths.f90 
hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$ ./a.out 
  3.234239354816571E-191
 -4.009123520117420E-164

Or you can run CAM-FV filter
/glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run

The numbers for the test_maths.f90 program are from proc3 (out of 128) j==61138

Build information

Please describe:

  1. Derecho
  2. intel

also mac gfortran GNU Fortran (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0

[hkershaw:problem]() > gfortran fortran_maths.f90
[hkershaw:problem]() > ./a.out 
   3.2342393548165711E-191
  -4.0091235201174199E-164
@hkershaw-brown hkershaw-brown added the QCEFF quantile conserving filters label Oct 3, 2024
@hkershaw-brown
Copy link
Member Author

there are some early returns in probit_transform_mod

! Do not know what to do if sd of original ensemble is 0 (or small, work on this later)
if(get_bnrh_sd(p) <= 0.0_r8) then
! Just return the original ensemble
probit_ens = state_ens

@hkershaw-brown
Copy link
Member Author

I think this may be a reproducer for at least part of #681 and also why #709

@hkershaw-brown
Copy link
Member Author

whole ensemble:

before line 559

"i","value"
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0
10,0
11,0
12,0
13,0
14,0
15,0
16,0
17,0
18,0
19,0
20,0
21,0
22,0
23,0
24,0
25,0
26,0
27,0
28,0
29,0
30,0
31,0
32,0
33,0
34,0
35,0
36,0
37,0
38,0
39,0
40,0
41,0
42,0
43,0
44,0
45,0
46,0
47,3.2342393548165711e-191
48,0
49,0
50,0
51,0
52,0
53,0
54,0
55,0
56,0
57,0
58,0
59,0
60,0
61,0
62,0
63,0
64,0
65,1.5319740054648927e-206
66,1.3760414660445314e-201
67,0
68,0
69,0
70,1.0275904858792872e-161
71,0
72,0
73,0
74,0
75,0
76,0
77,0
78,0
79,6.9040677117309552e-162
80,0

after inflation

"i","value"
1,-4.0091235201174199e-164
2,-4.0091235201174199e-164
3,-4.0091235201174199e-164
4,-4.0091235201174199e-164
5,-4.0091235201174199e-164
6,-4.0091235201174199e-164
7,-4.0091235201174199e-164
8,-4.0091235201174199e-164
9,-4.0091235201174199e-164
10,-4.0091235201174199e-164
11,-4.0091235201174199e-164
12,-4.0091235201174199e-164
13,-4.0091235201174199e-164
14,-4.0091235201174199e-164
15,-4.0091235201174199e-164
16,-4.0091235201174199e-164
17,-4.0091235201174199e-164
18,-4.0091235201174199e-164
19,-4.0091235201174199e-164
20,-4.0091235201174199e-164
21,-4.0091235201174199e-164
22,-4.0091235201174199e-164
23,-4.0091235201174199e-164
24,-4.0091235201174199e-164
25,-4.0091235201174199e-164
26,-4.0091235201174199e-164
27,-4.0091235201174199e-164
28,-4.0091235201174199e-164
29,-4.0091235201174199e-164
30,-4.0091235201174199e-164
31,-4.0091235201174199e-164
32,-4.0091235201174199e-164
33,-4.0091235201174199e-164
34,-4.0091235201174199e-164
35,-4.0091235201174199e-164
36,-4.0091235201174199e-164
37,-4.0091235201174199e-164
38,-4.0091235201174199e-164
39,-4.0091235201174199e-164
40,-4.0091235201174199e-164
41,-4.0091235201174199e-164
42,-4.0091235201174199e-164
43,-4.0091235201174199e-164
44,-4.0091235201174199e-164
45,-4.0091235201174199e-164
46,-4.0091235201174199e-164
47,-4.0091235201174199e-164
48,-4.0091235201174199e-164
49,-4.0091235201174199e-164
50,-4.0091235201174199e-164
51,-4.0091235201174199e-164
52,-4.0091235201174199e-164
53,-4.0091235201174199e-164
54,-4.0091235201174199e-164
55,-4.0091235201174199e-164
56,-4.0091235201174199e-164
57,-4.0091235201174199e-164
58,-4.0091235201174199e-164
59,-4.0091235201174199e-164
60,-4.0091235201174199e-164
61,-4.0091235201174199e-164
62,-4.0091235201174199e-164
63,-4.0091235201174199e-164
64,-4.0091235201174199e-164
65,-4.0091235201174199e-164
66,-4.0091235201174199e-164
67,-4.0091235201174199e-164
68,-4.0091235201174199e-164
69,-4.0091235201174199e-164
70,1.215420420032912e-161
71,-4.0091235201174199e-164
72,-4.0091235201174199e-164
73,-4.0091235201174199e-164
74,-4.0091235201174199e-164
75,-4.0091235201174199e-164
76,-4.0091235201174199e-164
77,-4.0091235201174199e-164
78,-4.0091235201174199e-164
79,8.1528847158862943e-162
80,-4.0091235201174199e-164

@hkershaw-brown
Copy link
Member Author

from_probit_bounded_normal_rh state_ens gets set to probit_ens

Screenshot 2024-10-03 at 2 16 19 PM Screenshot 2024-10-03 at 2 16 33 PM

not sure why the more_params(7) tail_sd_eft is negative? get_bnrh_sd(p)

! Get ensemble mean and sd
call normal_mean_sd(x, ens_size, mean, tail_sd_left)
tail_sd_right = tail_sd_left
! Don't know what to do if sd is 0; tail_sds returned are illegal value to indicate this
if(tail_sd_left <= 0.0_r8) then
tail_sd_left = -99_r8
tail_sd_right = -99_r8
return
endif

@hkershaw-brown
Copy link
Member Author

The sd == 0 so you never transform into (or back out of) probit space.

Screenshot 2024-10-03 at 3 29 19 PM

Inflation pushes the
I think in this case
ens = (ens - mean) * sqrt(inflate) + mean
if your ens(i) == mean then you can not inflate, but this is only going to work for multiplicative inflation. If you're adding noise or whatever various inflation methods do, I think you will have to enforce the bounds. Is this ok? clamping rather than qceff enforced bounds.

 555       ! Spread the ensemble out linearly for deterministic
 556       sd_inflate = sqrt(inflate)
 557       ! HK if ens == mean do nothing
 558       where ((ens - mean) > epsilon(mean) )
 559          ! Following line can lead to inflation of 1.0 changing ens on some compilers
 560          !!! ens = (ens - mean) * sqrt(inflate) + mean
 561          ! Following gives 1.0 inflation having no impact on known compilers
 562          ens = ens * sd_inflate + mean * (1.0_r8 - sd_inflate)
 563       endwhere

@hkershaw-brown
Copy link
Member Author

note there is issue #748 which is a different inflation QCEFF bug which looks like it may create out-of-bounds values also.

@jlaucar
Copy link
Contributor

jlaucar commented Dec 4, 2024

This bug impacted the application of inflation, but it was potentially much more insidious. As noted by Helen, the problem occured when the bnrh transform was unable to complete because the ensemble standard deviation was not positive. This same problem can arise for all of the other places in assim_tools_mod.f90 where transforms are made. The most straightforward solution begins by having an error return from the probit transform. Only the bnrh can fail so all other types of distributions always complete and do not need to be considered. The original code design assumed that if the standard deviation was not positive, that all ensemble members must have the same value. However, as noted in Helen's examples, if the differences between ensemble members are very small, computational precision limits can result in a computed standard deviation that is not positive. The assumption for the inflation and the other parts of assim_tools was that any distribution with all members identical could not be changed so it didn't matter if the failure of the transform was just ignored.

Note that the numerical analysis is very complex for the case where the computed SD for the bnrh transform is very small but non-zero. A number of tests suggest that this does not have inappropriate results, but I would not be shocked if something bad could happen in unusual cases.

@jlaucar
Copy link
Contributor

jlaucar commented Dec 23, 2024

Moha and Jeff did a code review of the proposed bug fix which led to one additional clarification to the code. The Lorenz_06_tracer tests seem to be okay. This version has been pushed and is now ready for testing with the CAM reanalysis code by Kevin.

edit: change is 1f2b7e1

@hkershaw-brown
Copy link
Member Author

note for performance later, to_probit_bounded_normal_rh returns the original ensemble if the transform fails:

! Fail if sd is not positive (or small, work on this later)
if(get_bnrh_sd(p) <= 0.0_r8) then
ierr = 1
! Just return the original ensemble
probit_ens = state_ens
! Free up the storage that would have been used for the transformed variables
call deallocate_distribution_params(p)
return

check on ierr, and set again here: not needed?

do i = 1, num_vars
call transform_to_probit(ens_size, state_ens(1:ens_size, i), distribution_type(i), &
p(i), temp_ens, use_input_p, bounded_below, bounded_above, lower_bound, upper_bound, ierr(i))
if(ierr(i) == 0)
probit_ens(1:ens_size, i) = temp_ens
else
! If transform failed, return the input state
probit_ens(1:ens_size, i) = state_ens(1:ens_size, i)
endif
end do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working QCEFF quantile conserving filters
Projects
None yet
Development

No branches or pull requests

2 participants