Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more testing #535

Merged
merged 10 commits into from
Nov 30, 2023
Merged

more testing #535

merged 10 commits into from
Nov 30, 2023

Conversation

jbathegit
Copy link
Collaborator

Part of #445

Note that the additional debufr test required a change to the source code for the utility, to ensure that allocated memory was freed following the execution of the newly-tested code. Otherwise, the address sanitizer would complain about memory leakage and segfault during the CI tests.

@jbathegit
Copy link
Collaborator Author

Now all of a sudden the MacOS runner is failing with a missing python module:

Capture

Just in case this was some temporary glitch, I tried rerunning it several times and on different days, but to no avail.

@AlexanderRichert-NOAA @aerorahul @climbfuji @edwardhartnett any suggestions?

@jbathegit
Copy link
Collaborator Author

BTW, I realize I could just change the -DENABLE_PYTHON=ON flag to -DENABLE_PYTHON=OFF in the cmake step of the build-bufr job in the MacOS.yml file, but I'm assuming somebody at some point had a good reason for enabling python testing on this platform(?)

Currently, we have python testing enabled in the develop, Linux, and MacOS runners, but it's currently disabled in the Intel runner.

edwardhartnett
edwardhartnett previously approved these changes Nov 7, 2023
@jbathegit
Copy link
Collaborator Author

And now all of a sudden the MacOs build is working again!?

@jbathegit
Copy link
Collaborator Author

In this latest commit I added more bort testing for nemtbb() and nemtbd(). But I'm struggling to write test cases for several of the bort options in those routines using dummied-up DX tables containing obvious errors. For example:

  • The 901 fail in both routines is if an FXY number isn't between 000000 and 063255. But any number outside of that range would already get diagnosed as an error in rdusdx() when the DX table is first read in via a call to openbf() or otherwise. And such a table needs to be previously read in because otherwise there's no data in the internal tabb and tabd arrays for nemtbb() and nemtbd() to work with. So can such an error ever really get triggered in either of those routines (i.e. do we really need this bort check in those routines)?
  • On a similar note, in subroutine elemdx() (which is called from rdusdx()), a maximum of 3 digits is stored for a scale factor for any Table B value in the tabb array, and a maximum of 10 digits is stored for any reference value. So how exactly could we ever have a 902 or 903 fail in nemtbb() for a stored scale factor not in the range of -999 to 999, or for a stored reference value not in the range of -1E11 to 1E11? In other words, how could either of those bort cases ever realistically get triggered in nemtbb(), given that such outlier values aren't even storable in the internal tabb array?
  • Furthermore, the 904 and 906 fail cases in nemtbd() couldn't seemingly ever get triggered either, because any such cases would already get diagnosed as an error within seqsdx() (which is also called from rdusdx()) when the DX table is first read in to the internal tabd array. So how exactly would those cases ever realistically occur?

In summary, it seems to me we could just get rid of a number of these bort tests in the nemtbb() and nemtbd() routines, because unless I'm missing something they don't really seem to be serving any practical purpose. @jack-woollen please chime in if you have any thoughts on this - thanks!

jack-woollen
jack-woollen previously approved these changes Nov 8, 2023
@jbathegit
Copy link
Collaborator Author

Thanks @jack-woollen, but before I merge this PR (and which in turn will close it :-) do you have any thoughts on my above suggestions to eliminate some of those bort tests in nemtbb() and nemtbd(). Do you agree with my thought process, or can you think of something I'm missing in my analysis?

@jack-woollen
Copy link
Contributor

Jeff, a few thoughts. First, it's not impossible for those bort conditions to happen, its just impossible to test them. Because the only way the impossible conditions are possible is all the rules go out the window such as memory clobbering due to bad inputs or bad coding, or some other bad things. I think these types of tests are worthwhile and do not at all subscribe to the idea that they are useless. The only reason I think is important enough to consider maybe getting rid of some of them would be if they cost significant amounts of time. I was in the process of starting to check that out when dogwood switched to production. I'll pick it up again later today. It may be useful to classify the bort statements into those that check on function and those that check on dysfunction, to see the ratio, and calculate the actual costs of each type.

@edwardhartnett
Copy link
Contributor

If code cannot actually be reached, then it should be removed. Nor can we try to capture errors caused my random memory overwrites or other totally unpredictable conditions.

What often happens in a testing campaign is that you realize some code can never be reached. So remove that code!

Nothing should stay in the code just because "it does no harm." Every line of code costs money and time to write and more importantly maintain. So if the line of code is not doing anything useful, remove it, and cut code size and maintenance costs.

Consider this: right now @jbathegit has figured out that these lines can't be reached, and raised questions about them, and that's caused us all to take some time to look at the problem. That's not free. This code is already costing NOAA extra time and money. So let's resolve the problem - if the lines are not needed, remove them.

Then, in 5 years, when this issue is long forgotten, NOAA won't incur the same costs by some programmer wondering if those lines of code could ever be reached and raising the whole issue all over again. Which will once again cause expense for NOAA for lines of code that we don't need.

@jbathegit
Copy link
Collaborator Author

After giving this some more thought, I'm leaning more towards Ed's way of thinking about this. I hear what you're saying Jack, but I agree with Ed that we shouldn't retain code that we can't test, and that all bets are off anyway if something unpredictable happens.

So, in the near future, I'll push up another commit with those unreachable bort tests removed.

@jbathegit
Copy link
Collaborator Author

I've made the aforementioned nemtbb and nemtbd changes to remove those untestable bort cases, and I've also merged in the changes from #539. But now I'm once again seeing CI errors for MacOS (with a different Python build error) and Intel (with cmake now apparently trying to build the oneAPI environment).

@AlexanderRichert-NOAA not sure if this is related in any way to what you're working on in #536, but obviously none of that has been merged into this branch yet, so I'm a bit stumped.

Copy link
Contributor

@AlexanderRichert-NOAA AlexanderRichert-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest merging #542 first in order to test with Intel CI

@jbathegit jbathegit merged commit e145fe2 into develop Nov 30, 2023
6 checks passed
@jbathegit jbathegit deleted the jba_moretesting branch November 30, 2023 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants