Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of dimensions #504

Open
farheen2022 opened this issue Mar 27, 2023 · 6 comments
Open

Order of dimensions #504

farheen2022 opened this issue Mar 27, 2023 · 6 comments

Comments

@farheen2022
Copy link

Sir kindly tell, how to change the name and order of dimension of NETCDF file. I am using chirps dataset and my dimension name is latitude, I am unable to change it to lat and that is raising error. I have tried using ncpdq in Conda prompt for correct order of dimensions but that is raising error related to size of the internal memory.

@bradleyswilson
Copy link

You can rename a netCDF dimension with xarray.rename() function, e.g. xarray.rename({'latitude': 'lat'}

If you need to change the ordering of dimensions, you can use xarray.transpose() e.g. data["prcp"].transpose("lat", "lon", "time")

@monocongo
Copy link
Owner

As @bradleyswilson alludes to above you can leverage xarray for this and then write the resulting xarray.Dataset object to file. Then use that new NetCDF file as input to this package's main processing script.

@farheen2022
Copy link
Author

I am now able to change the order of dimension of the input file and save it. The problem was arising because the file was too big, almost 7GB. I was using CHIRPS rainfall dataset. I checked using CRU rainfall dataset and I am able to change my input file. Thank you @bradleyswilson @monocongo

@maxxpower007
Copy link

To do an automatic conversion, I usually add these lines after every update:

In _compute_write_index ( main.py )
after this line:

`dataset = xr.open_mfdataset(list(set(files)), chunks=chunks)

# Add this ################################
if 'latitude' in dataset.coords:
    dataset.rename({'latitude':'lat','longitude':'lon'})
if 'bnds' in dataset.dims:
    dataset = dataset.drop('time_bnds')
keys = list(dataset.keys())
for key in keys:
    if 'time' in dataset.coords:
        dataset[key] = dataset[key].transpose("lat", "lon", "time")
    else:
        dataset[key] = dataset[key].transpose("lat", "lon")
if 'time' in dataset.coords:
    dataset = dataset[['lat', 'lon', 'time', *keys ]]
else:
    dataset = dataset[['lat', 'lon', *keys ]]
########################################

`

And in _prepare_file ( main.py )
After this line

` ds = xr.open_dataset(netcdf_file)

# Add this ################################
if 'latitude' in ds.coords:
    ds.rename({'latitude':'lat','longitude':'lon'})
if 'bnds' in ds.dims:
    ds = ds.drop('time_bnds')
keys = list(ds.keys())
for key in keys:
    if 'time' in ds.coords:
        ds[key] = ds[key].transpose("lat", "lon", "time")
    else:
        ds[key] = ds[key].transpose("lat", "lon")
if 'time' in ds.coords:
    ds = ds[['lat', 'lon', 'time', *keys ]]
else:
    ds = ds[['lat', 'lon', *keys ]]
########################################

`

@maxxpower007
Copy link

One more that is unrelated:

I usually have to change pet in indices.py

From:
if (latitude_degrees is not None) and not np.isnan(latitude_degrees) and (-90.0 < latitude_degrees < 90.0):

To:
if (latitude_degrees is not None) and not np.isnan(latitude_degrees) and (-90.0 <= latitude_degrees <= 90.0):

@monocongo
Copy link
Owner

Thanks for helping @maxxpower007 ! The common fixes you outlined above might be useful for all users -- maybe we should roll these into the main processing script? One limitation, for now, is that there are no proper tests for the main processing script, so harder to be sure we've not broken something if we add code willy-nilly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants