Refactor I/O to rely on DataFrames & provide storage options

Implement the plan we've discussed to abstract out model I/O, so that:
- model runs by interacting with a dictionary of data frames in memory
  - presently the model reads/writes to HDF5 during the model execution
- reading/writing to storage from the dictionary of data frames is done with a separate set of functions.

This will enable us to read/write to Parquet (#23), Zarr, and other high performance file formats, in addition to HDF5.

For performance, consider using [Dask DataFrame](https://docs.dask.org/en/latest/dataframe.html), which is very similar to and integrated with Pandas Dataframes but has built-in parallelization and chunking that allows for performant manipulation of very large datasets that are bigger than RAM. We might want to use these Dask dataframes instead of Pandas dataframes throughout the repo (or at least easily switch between the two).

Dask Tutorial: http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/dask.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor I/O to rely on DataFrames & provide storage options #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor I/O to rely on DataFrames & provide storage options #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions