Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting rasm file to netCDF3 using xarray #1114

Closed
fmaussion opened this issue Nov 13, 2016 · 6 comments · Fixed by pydata/xarray-data#7
Closed

Converting rasm file to netCDF3 using xarray #1114

fmaussion opened this issue Nov 13, 2016 · 6 comments · Fixed by pydata/xarray-data#7

Comments

@fmaussion
Copy link
Member

This would help new users like #1113 and simplify the RTD build process (#1106).

The problem is that it is not as trivial as expected. On the latest master:

import xarray as xr
ds = xr.tutorial.load_dataset('rasm')
ds.to_netcdf('rasm.nc', format='NETCDF3_CLASSIC', engine='scipy')

Throws an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/mowglie/Documents/git/xarray/xarray/backends/api.py in to_netcdf(dataset, path, mode, format, group, engine, writer, encoding)
    516     try:
--> 517         dataset.dump_to_store(store, sync=sync, encoding=encoding)
    518         if isinstance(path, BytesIO):

/home/mowglie/Documents/git/xarray/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding)
    754         if sync:
--> 755             store.sync()
    756 

/home/mowglie/Documents/git/xarray/xarray/backends/scipy_.py in sync(self)
    149         super(ScipyDataStore, self).sync()
--> 150         self.ds.flush()
    151 

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in flush(self)
    388         if hasattr(self, 'mode') and self.mode in 'wa':
--> 389             self._write()
    390     sync = flush

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write(self)
    400         self._write_gatt_array()
--> 401         self._write_var_array()
    402 

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write_var_array(self)
    448             for name in variables:
--> 449                 self._write_var_metadata(name)
    450             # Now that we have the metadata, we know the vsize of

/home/mowglie/.pyvirtualenvs/py3/lib/python3.4/site-packages/scipy/io/netcdf.py in _write_var_metadata(self, name)
    466         for dimname in var.dimensions:
--> 467             dimid = self._dims.index(dimname)
    468             self._pack_int(dimid)

ValueError: '2' is not in list
@fmaussion
Copy link
Member Author

Should we try to get to the source of this, or should I simply use ncks to do the conversion?

@shoyer
Copy link
Member

shoyer commented Nov 13, 2016

Can you write it as netcdf3 using engine="netcdf4"? This might be a scipy bug.

@fmaussion
Copy link
Member Author

Yes, it works. But opening it with scipy throws a new error (to xarray's defense, the file I created with ncks also can't be opened with scipy- but it can with ncview).

I think all this gets way to complicated for a Sunday evening and a simple demo file ;-)

@jhamman
Copy link
Member

jhamman commented Dec 27, 2016

I also think this is a scipy bug. After converting the file to netCDF3 CLASSIC mode, I get an error in the scipy backend...

$ ncks -3 rasm.nc rasm.nc

$ ncdump -k rasm.nc 
classic
$ ncdump -h rasm.nc 
netcdf rasm {
dimensions:
	time = 36 ;
	y = 205 ;
	x = 275 ;
variables:
	double Tair(time, y, x) ;
		Tair:_FillValue = 9.96920996838687e+36 ;
		Tair:units = "C" ;
		Tair:long_name = "Surface air temperature" ;
		Tair:dimensions = "2" ;
		Tair:type_preferred = "double" ;
		Tair:time_rep = "instantaneous" ;
		Tair:coordinates = "yc xc" ;
	double time(time) ;
		time:dimensions = "1" ;
		time:long_name = "time" ;
		time:type_preferred = "int" ;
		time:units = "days since 0001-01-01" ;
		time:calendar = "noleap" ;
	double xc(y, x) ;
		xc:long_name = "longitude of grid cell center" ;
		xc:units = "degrees_east" ;
		xc:bounds = "xv" ;
	double yc(y, x) ;
		yc:long_name = "latitude of grid cell center" ;
		yc:units = "degrees_north" ;
		yc:bounds = "yv" ;

// global attributes:
		:title = "/workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc" ;
		:institution = "U.W." ;
		:source = "RACM R1002RBRxaaa01a" ;
		:output_frequency = "daily" ;
		:output_mode = "averaged" ;
		:convention = "CF-1.4" ;
		:references = "Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429." ;
		:comment = "Output from the Variable Infiltration Capacity (VIC) model." ;
		:nco_openmp_thread_number = 1 ;
		:NCO = "\"4.6.0\"" ;
		:history = "Tue Dec 27 13:38:40 2016: ncks -3 rasm.nc rasm.nc\n",
			"history deleted for brevity" ;
}```

```Python

In [1]: from scipy.io import netcdf
   ...: f = netcdf.netcdf_file('rasm.nc', 'r')
   ...: for k, v in f.variables.items():
   ...:     print(k, v.dimensions)
   ...:     
yc ('y', 'x')
xc ('y', 'x')
time b'1'
Tair b'2'

In [2]: import xarray as xr
   ...: xr.open_dataset('rasm.nc', engine='netcdf4')
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) datetime64[ns] 1980-09-16T12:00:00 1980-10-17 ...
    xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
    yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...

I don't know where scipy is getting the b'1' and b'2' dimensions. I can push the converted dataset to xarray-data but that doesn't really solve the problem of using scipy.

@shoyer
Copy link
Member

shoyer commented Dec 27, 2016

It looks like scipy gets confused between the the dimensions = '1' netCDF attribute and the variable's actual dimensions, so this is another scipy bug. When it sets the dimensions netCDF attribute, it overwrites the Python attribute of the same name: https://github.com/scipy/scipy/blob/c48dfa43eae3474f06353ed3664caed945e9aee1/scipy/io/netcdf.py#L837-L849

The simple work around is to remove the dimensions attribute from each of these variables.

@jhamman
Copy link
Member

jhamman commented Dec 27, 2016

I see. I should have looked at the attributes. pydata/xarray-data#7 fixes these issue and the dataset can now be read with scipy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants