-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to compute zonal mean #33
Comments
Existing Fortran code: https://gist.github.com/andersy005/44e063e56bed328e8651eb41633c8230 |
I've followed @klindsay28's recommendation and in the
@kristenkrumhardt you may want to use this python wrapper in your notebooks, and I'm happy to help you spin up with it. |
cc @erogluorhan |
We will be reading through this issue with @anissa111 |
Thanks @erogluorhan! One thing for you and @anissa111 to keep in mind is that there's the existing Fortran version computes zonal means for a variety of regions, and it is very important to keep that functionality in the python version. I'm happy to chat with both of you and walk you through the Fortran version / the python wrapper from #33 (comment) if that would be helpful. |
In addition to using a region mask, @klindsay28 has a version of za that can compute time-varying thickness-weighted zonal means, which is important for working with data on alternative vertical coordinates. Ultimately, I think we want to have an implementation that is not POP-specific. However, I suggest implementing in @klindsay28 can provide the latest version of the za code, which includes the thickness-weighting capability. We should discuss the plan of attack once you've decided whether to proceed. |
I thought that I had a version of za with thickness-weighting capability, but I am unable to locate it. The most recent development version of the za code that I have is on CISK machines in the directory Note that this logic of determining valid variable values will need to change when converting to tin the xarray framework where I'm happy to work with software engineers that are pursuing converting the Fortran code to python. |
A top-level question for this functionality is "What does the API look like?". Some possibilities include:
To me, 1. seems like just converting the existing Fortran program to python. However, I think we have an opportunity to have a more pythonic, and extensible, solution if we don't do that. That's why I'm throwing 2. into the mix. Please think of it as an initial straw man proposal. I'm sure there are other approaches that would be superior in the long run. |
Per a meeting last week, it sounds like @anissa111 will be working on this (I can't assign the ticket directly to her, she probably needs to be added to some list or other in the github settings for this repository) |
@mnlevy1981, this is now fixed. You should be able to assign anyone from the Geocat team to issues/PRs... |
An important detail in 'translate to python' is what netCDF API to use. The Fortran program Since we're considering eventually using Comments? @matt-long, @dcherian , @mnlevy1981 |
Absolutely! We should use |
I've placed notes for a walk-through of the existing zon_avg program into this google slides document. |
Calculating a zonal mean is using a mask (or clipping region) to determine a set of weights that can then be passed to I wonder if we could use xoak to determine those weights. cc @benbovy related: xarray-contrib/xoak#34 |
@dcherian, the existing Fortran program computes zonal averages in latitude bands. It does this by computing the area of the intersection between each model cell and each latitude band, and then performing an intersection area weighted average in each latitude band. Values from individual model cells have contributions in all latitude bands that the cell intersects. In particular, values from cells contribute to the zonal average in multiple latitude bands. I don't see how to represent this one-to-many mapping with xarray's weighted framework. If you change the mathematical definition of what you mean by zonal mean then perhaps you can implement it with xarray's weighted object framework. |
Sorry I should've been clearer, I was visualizing a single latitude band when writing my comment. Is this pseudo-code right? zonal_mean_field[time, z, lat_bands] = (
xr.dot(
field[time, z, nlat, nlon],
intersection_area_weights[nlat, nlon, lat_bands],
)
.mean(["nlan", "nlon"])
) We need to efficiently generate EDIT: changed from weighted to dot because I'm not sure if that weighted call would work. |
I'm not sure yet how That said, if you look for an alternative to the fortran |
Perhaps I'm just being dense, but I'm not following what your pseudocode does. Are the One concern I have with what you wrote is memory usage. If One thing that's important to get right is handling how the land-sea mask changes over depth when you compute the mean. If the weights are computed for the surface land-sea mask, then you need to be sure to omit terms in the denominator of the mean when the corresponding terms in the numerator are FillValue/NaN. I'm not sure if your pseudocode handles that (because I'm not certain what it does). A totally different approach is to use ESMF (perhaps xESMF) to regrid to a dst lat-lon grid using a consersative area-weighted average, and take zonal means there. I think that going the ESMF route would handle the sparsity of the weights. In order for this to work properly at land-sea boundaries, you need to get from ESMF the fraction of the cell covered on the dst grid, and incorporate that as a weight in the zonal mean computation on the dst grid. I'm not familiar enough with the ESMF regridding API to know if this dst grid coverage is available. |
xarray supports sparse arrays via the sparse package: In [1]: import xarray as xr
In [2]: import sparse
In [3]: coords = [[0, 1, 2, 3, 4],
...: [0, 1, 2, 3, 4]]
In [4]: data = [10, 20, 30, 40, 50]
In [5]: s = sparse.COO(coords, data, shape=(5, 5))
In [7]: arr = xr.DataArray(s, dims=['lat', 'lon'])
In [8]: arr
Out[8]:
<xarray.DataArray (lat: 5, lon: 5)>
<COO: shape=(5, 5), dtype=int64, nnz=5, fill_value=0>
Dimensions without coordinates: lat, lon
In [18]: arr.data.todense()
Out[18]:
array([[10, 0, 0, 0, 0],
[ 0, 20, 0, 0, 0],
[ 0, 0, 30, 0, 0],
[ 0, 0, 0, 40, 0],
[ 0, 0, 0, 0, 50]]) The sparse package lacks support for some operations that are common with numpy though. For instance, as of today, the dot product doesn't work (see pydata/sparse#31): DetailsIn [19]: xr.dot(arr, arr)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-ad3fc918ce5d> in <module>
----> 1 xr.dot(arr, arr)
~/opt/miniconda3/envs/playground/lib/python3.8/site-packages/xarray/core/computation.py in dot(dims, *arrays, **kwargs)
1477 # to construct a partial function for apply_ufunc to work.
1478 func = functools.partial(duck_array_ops.einsum, subscripts, **kwargs)
-> 1479 result = apply_ufunc(
1480 func,
1481 *arrays,
~/opt/miniconda3/envs/playground/lib/python3.8/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
1126 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
1127 elif any(isinstance(a, DataArray) for a in args):
-> 1128 return apply_dataarray_vfunc(
1129 variables_vfunc,
1130 *args,
~/opt/miniconda3/envs/playground/lib/python3.8/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
269
270 data_vars = [getattr(a, "variable", a) for a in args]
--> 271 result_var = func(*data_vars)
272
273 if signature.num_outputs > 1:
~/opt/miniconda3/envs/playground/lib/python3.8/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
722 )
723
--> 724 result_data = func(*input_data)
725
726 if signature.num_outputs == 1:
~/opt/miniconda3/envs/playground/lib/python3.8/site-packages/xarray/core/duck_array_ops.py in f(*args, **kwargs)
54 else:
55 wrapped = getattr(eager_module, name)
---> 56 return wrapped(*args, **kwargs)
57
58 else:
<__array_function__ internals> in einsum(*args, **kwargs)
TypeError: no implementation found for 'numpy.einsum' on types that implement __array_function__: [<class 'sparse._coo.core.COO'>] |
re: Ah sorry (again!) the corrected zonal_mean_field[time, z, lat_bands] = xr.dot(
field[time, z, nlat, nlon],
intersection_area_weights[z, nlat, nlon, lat_bands],
dims=["nlan", "nlon"]
) (this is what I was thinking of The conservative regridding approach did come to mind. xgcm has implemented it for vertical regridding with numba. I wonder if they're open to adding the 2D version of that functionality. I have only used xesmf for simple regridding problems, so I don't know if it handles everything that your code does. I guess there are 4 possible approaches:
@klindsay28 thanks for taking the time to reply here and correct my mistakes. |
Another issue to keep in mind is the extensibility of this tool to MOM6. If the tool is being applied to MOM6 output that is on a |
@mgrover1 and I are talking about this issue. It sounds like it might be feasible to go straight to an implementation based on xESMF (Max has experience with the xESMF API). One challenge is getting grid cell corners to xESMF. pop-tools has support for various POP grids, but it is not clear how to automatically determine your grid (e.g, "gx1v7") from a POP history file. How does one do this reliably in practice? pinging @matt-long and @dcherian |
Does this mean conservative regridding and then averaging?
I have no idea! I do remember that there was a function or PR that decided a grid name based on number of points (length of nlat,nlon) but that is not robust at all. |
The pop-tools get grid function can return a SCRIP-format dataset that has the grid corners—so the corners are available. This doesn't solve the problem of how to determine the grid from a history file, but perhaps we could simply require the user to input this information. |
Yes. I briefly described this approach at the end of #33 (comment). |
How is the depth dimension handled? One approach is to use remapping weights generated at the surface, then remap a field of ones at each level and renormalize the remapped field by this sum. I do that here: |
In talking to @klindsay28, this sounds like a large undertaking. There's a Fortran program in
/glade/u/home/klindsay/bin/zon_avg/
and his current recommendation is to write a python script that takes in an xarray dataset, writes it to netCDF, uses anos.system
call to runzon_avg
, reads in the resulting netCDF file as another xarray dataset, and returns that second dataset.Using the on hold label until we have the time to dedicate to a proper implementation.
The text was updated successfully, but these errors were encountered: