-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selection of specific input files in xarray #36
Comments
@milenaveneziani, good call. xarray also allows you to supply a simple list of files, so we could probably use glob.glob, which supports regular expressions of the type you want, to get a list of file paths to pass on to xarray. @pwolfram, what do you think? Could this be added to mpas_xarray so we don't have to alter every call to |
@xylar, yes this can be solved but we would have to use glob outside mpas_xarray / xarray because xarray expects a string on input, e.g., from their code: if isinstance(paths, basestring):
paths = sorted(glob(paths))
if not paths:
raise IOError('no files to open') Therefore, we could make a function like def paths(*args):
"""
Returns glob'd paths in list for arbitrary number of function arguments.
Note, each expanded set of paths is sorted.
Phillip J. Wolfram
10/25/2016
"""
paths = []
for aargs in args:
paths += sorted(glob.glob(aargs))
return paths and call xarray via ds = xr.open_mfdataset(paths('*year3[0-9]*nc', '*year40*nc'), ...) I would say the syntax here is simple and fairly clean and will give you a list of paths that xarray should open. I've submitted a PR: #37 |
cc @pwolfram, @milenaveneziani, @vanroekel (I think this comment got missed because I didn't explicitly put any names on it). While #37 and #38 partly address this issue, there is not currently a way to specify (e.g. in the config file) the start and end times of the analysis. How would we want to specify start and end times? Presumably with date strings? A related question is how we want to apply that range. It would seem most useful to do this via An option would be for thoughts? |
@xylar, if we have the general wild-carded paths we can use xarray to do the time slicing. I think this is the simplest approach because xarray does lazy evaluation and doesn't have to load all the data. However, we should keep this in mind in case we find that opening a ton of files creates either a crash or performance issues (which it can at certain scales, e.g., 1000+ files). If so, this is probably a brand new issue / discussion on how to deal with that problem. |
I think we have addressed this issue with #37 and should close this issue unless there is something else we are missing-- would you agree @xylar and @milenaveneziani? If so, please feel free to close this issue. |
@pwolfram, no, I don't think this has been addressed. See my comment above: #36 (comment) |
@pwolfram, I now think this issue will be fully addressed by #37. Even so, I don't think the issue should be closed until #37 gets merged, since the issue would remain unaddressed if we decided to close #37 for some reason. Does that make sense to you? Or is it standard GitHub practice to close issues as soon as there is a PR that addresses them? |
not closing this until we sort out the start/end time issue encountered in #38. |
Since we now have #45, I think we can perhaps close this one? |
Agreed. |
At the moment, when loading multi-files with xarray, the input string cannot be a list or include wild-card characters of the kind, say [0-3].
This is not useful when wanting to load a reduced dataset when the goal is to compute climatologies over a certain number of years.
Say for example that we want to compute the following seasonal averages: DJF, MAM, JJA, SON, over year 31-40. We would want to load the following input files:
year3[0-9], year40
but instead xarray only allows something like year3?
The text was updated successfully, but these errors were encountered: