Open
Description
Implement the plan we've discussed to abstract out model I/O, so that:
- model runs by interacting with a dictionary of data frames in memory
- presently the model reads/writes to HDF5 during the model execution
- reading/writing to storage from the dictionary of data frames is done with a separate set of functions.
This will enable us to read/write to Parquet (#23), Zarr, and other high performance file formats, in addition to HDF5.
For performance, consider using Dask DataFrame, which is very similar to and integrated with Pandas Dataframes but has built-in parallelization and chunking that allows for performant manipulation of very large datasets that are bigger than RAM. We might want to use these Dask dataframes instead of Pandas dataframes throughout the repo (or at least easily switch between the two).
Dask Tutorial: http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/dask.html
Metadata
Metadata
Assignees
Labels
No labels