Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default 'combinewts' option of averager from 0 to 1 ? #33

Open
jypeter opened this issue Dec 12, 2019 · 12 comments
Open

Change the default 'combinewts' option of averager from 0 to 1 ? #33

jypeter opened this issue Dec 12, 2019 · 12 comments
Labels
kind/bug Categorizes issue related to bug.

Comments

@jypeter
Copy link
Member

jypeter commented Dec 12, 2019

@durack1 what do you think of my problem below?

I have very recently discovered that I may have misused genutil.averager incorrectly for years. I assumed that the order in which you did a spatial average did not matter, but this seems to be only true when the variable is not masked

The results are quite different! I think that the user expects what you get when specifying combinewts=1, except that the default is ZERO! That's why you may want to change the default value

genutil.averager is a very useful component of CDAT but there does not seem to be an example script/notebook using it on the web site. You may want to add one

Example with one time step of clt.nc

No problem here, because there are no masked points

>>> clt.shape
(46, 72)
>>> clt.count()
3312
>>> 46*72
3312
>>> cdutil.averager(clt, axis='xy', weights=['weighted','weighted'])
variable_182
masked_array(data=62.71149853,
             mask=False,
       fill_value=1e+20)

>>> cdutil.averager(clt, axis='yx', weights=['weighted','weighted'])
variable_190
masked_array(data=62.71149853,
             mask=False,
       fill_value=1e+20)

>>> cdutil.averager(clt, axis='yx', weights=['weighted','weighted'], combinewts=1)
variable_204
masked_array(data=62.71149853,
             mask=False,
       fill_value=1e+20)

Example with one time step of tas_cru_1979.nc

Some points are masked, and the values returned by cdutil.averager are quite different

>>> tas.shape
(36, 72)
>>> 36*72
2592
>>> tas.count()
1823
>>> genutil.averager(tas, axis='xy', weights=['weighted','weighted'])
variable_84
masked_array(data=12.62572236,
             mask=False,
       fill_value=1e+20)

>>> genutil.averager(tas, axis='yx', weights=['weighted','weighted'])
variable_92
masked_array(data=14.68750813,
             mask=False,
       fill_value=1e+20)

>>> genutil.averager(tas, axis='yx', weights=['weighted','weighted'], combinewts=1)
variable_110
masked_array(data=14.70993489,
             mask=False,
       fill_value=1e+20)

>>> genutil.averager(tas, axis='xy', weights=['weighted','weighted'], combinewts=1)
variable_130
masked_array(data=14.70993489,
             mask=False,
       fill_value=1e+20)
@doutriaux1
Copy link
Contributor

@jypeter I tocuh on this issue here: https://github.com/CDAT/Jupyter-notebooks/blob/master/scientific/Detrend_Data/Detrend_data.ipynb

@davis278 might have another/cleaner version

@davis278
Copy link

@jypeter and @doutriaux1, I do have a more detailed version of the Detrend Data notebook, but it is in review by a scientist who is presently at AGU. I will see if the review can be completed next week so we can publish the more detailed version of the Detrend Data notebook as soon as possible.

@davis278
Copy link

davis278 commented Dec 12, 2019

@jypeter and @durack1 the more detailed version of the Detrend Data notebook talks about how it matters what order you do the operations when there is missing or masked data, but it does not cover combinewts.

@jypeter
Copy link
Member Author

jypeter commented Dec 13, 2019

@doutriaux1 you do mention highlight a scientific point about the order of operations, but everything in the script use axis='xy', so we don't really know what would append if the user used axis='yx' instead, and which one is better. I think that the only way to get the correct result would be to explicitly use combinewts=1, or have this as a default

Besides, as I have mentioned in the 3rd bullet point of CDAT/cdat.github.io#197, when you specify `axis='xy', the user does not know (that is, the documentation does not say) if the average is performed on x then y, or y then x

I think it would be extremely useful to have a notebook covering correct (that is weighted) spatial and temporal averaging, and showing that the results you get can be quite different than what you get with arithmetic averaging. That's a point that we have to tell our new students over and over, and I'm not sure that other python packages than genutil make it possible.

This would also be the opportunity to demonstrate getWeight, how to multiply the cell weights by the Earth surface to get the cells' surface (a question we also get from time to time), how to use setTimeBoundsMonthly and the like, ...

Maybe spatial and temporal averaging could go to different notebooks, since temporal should also cover all the predefined climatology stuff

A notebook about correct regridding would be nice as well :)

@jypeter
Copy link
Member Author

jypeter commented Dec 13, 2019

These should probably go to CDUTIL and GENUTIL (should it just be called genutil now?) on Jupyter-notebooks Tutorials, and guess what, there seems to be a notebook almost ready for this! CDAT/Jupyter-notebooks#25

@durack1
Copy link
Member

durack1 commented Dec 15, 2019

@jypeter thanks for raising this issue. The CDAT package provides a user with a phenomenal head start as to data manipulation, but as you have highlighted nuances of the function defaults needs to be carefully considered. I think augmenting the examples, along with better function documentation would be a great step forward

@davis278
Copy link

@jypeter thanks for your comments. I've followed up on the review of the new version of the Detrend Data notebook and due to the holidays, it likely will not be ready until the new year. If you have a pressing need, let me know and I'll try to get you a copy of the notebook sooner rather than later. Admittedly, the revised Detrend Data notebook addresses only a few of the points raised here.

@jypeter
Copy link
Member Author

jypeter commented Dec 19, 2019

Thanks @davis278. There is no rush for the detrend notebook, especially if it does not mention combinewts.

  • maybe somebody should have a quick look at the existing notebooks using averager, and make sure they use combinewts=1 if this is required to have a correct result with masked data
  • I still think combinewts=1 should be the default, because I'm not sure what you get with combinewts=1 and axis='xy' or axis='yx' makes sense (ask users if they really want to get different results with these when their data is masked...)
  • if the cdutil notebook Jupyter-notebooks#25 notebook (mentioned above) is not on the web site, it should be put back on a fast track to get there

@github-actions
Copy link

Marking issue as stale, since there has been no activity in 30 days.

Unless the issue is updated or the 'stale' tag is removed, this issue will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 27, 2020
@durack1
Copy link
Member

durack1 commented Aug 27, 2020

@jasonb5 it would be VERY useful if the info in the jupyter notebook that @davis278 was working on found it's way into the doc strings

@github-actions github-actions bot removed the stale label Aug 27, 2020
@jypeter
Copy link
Member Author

jypeter commented Aug 28, 2020

I still think that we need a SAFE default value for combinewts ! Because we don't know if the users will read all the docstring and find the notebook

averager is a super useful asset of CDAT and would probably deserve its own notebook. Especially showing how you can easily get wrong results if you just use an arithmetic mean for spatial averaging (or the wrong parameters for the function)

@github-actions
Copy link

Marking issue as stale, since there has been no activity in 30 days.

Unless the issue is updated or the 'stale' tag is removed, this issue will be closed in 7 days.

@github-actions github-actions bot added the stale label Sep 27, 2020
@jasonb5 jasonb5 added kind/bug Categorizes issue related to bug. and removed stale labels Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue related to bug.
Projects
None yet
Development

No branches or pull requests

5 participants