You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both the plants and animals models require the users to provide cohort data. For plants, this is providing tuples of data:
(cell id, plant functional type, number of individuals, individual size)
There can be multiple entries per cell id and different numbers of cohorts per cell. The easiest and sanest format for this data is a simple data frame of those tuples and the natural format for creating and maintaining that data is a CSV or XSLX file. Forcing users to convert this into NetCDF for input is not sensible.
So, we need to:
Add a CSV/XLSX loader.
This should use pandas as that is already a requirement of xarray and is designed explicitly to handle data frames, rather than using the standard library csv or any of the numpy structures.
I think we will need to explicitly add openxlsx to [tool.poetry.dependencies] to support reading XLSX format.
Test that it works!
It should go in virtual_ecosystem.core.readers and I think the signature will look like:
@register_file_format_loader(file_types=(".csv", ".xlsx"))defload_from_dataframe(file: Path, var_name: str) ->DataArray:
"""Loads a DataArray from a data frame format."""
The format registry should then automatically switch to using this loader for CSV and XLSX files.
There is some ugliness here in that the file is going to be opened multiple times to load each variable as we don't have persistent file handles, but the same is currently true for NetCDF. A better way to do this in future would be to open each file within the data configuration once to access a tuple of variables that are claimed to live in that file, rather than independently opening the file specified for each variable.
The text was updated successfully, but these errors were encountered:
Both the plants and animals models require the users to provide cohort data. For plants, this is providing tuples of data:
There can be multiple entries per cell id and different numbers of cohorts per cell. The easiest and sanest format for this data is a simple data frame of those tuples and the natural format for creating and maintaining that data is a CSV or XSLX file. Forcing users to convert this into NetCDF for input is not sensible.
So, we need to:
pandas
as that is already a requirement ofxarray
and is designed explicitly to handle data frames, rather than using the standard librarycsv
or any of thenumpy
structures.openxlsx
to[tool.poetry.dependencies]
to support reading XLSX format.It should go in
virtual_ecosystem.core.readers
and I think the signature will look like:The format registry should then automatically switch to using this loader for CSV and XLSX files.
There is some ugliness here in that the file is going to be opened multiple times to load each variable as we don't have persistent file handles, but the same is currently true for NetCDF. A better way to do this in future would be to open each file within the data configuration once to access a tuple of variables that are claimed to live in that file, rather than independently opening the file specified for each variable.
The text was updated successfully, but these errors were encountered: