Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orange Table-specific HDF5Reader #6791

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

stuart-cls
Copy link

@stuart-cls stuart-cls commented Apr 25, 2024

Issue

Enable saving/loading the Orange Table data structure from the binary HDF5 container.
Based on the implementation used in the dask branch, but with the dask parts removed.

Related: #6356

Description of changes
  • Add HDF5Reader with both read and write_file methods.
  • Add round-trip test for HDF5Reader
  • Add h5py to requirements
  • Use existing .metadata sidecar file to store table.attributes if present.
Includes
  • Code changes
  • Tests
  • Documentation

@stuart-cls stuart-cls changed the title Orange hdf5 Orange Table-specifc HDF5Reader Apr 25, 2024
@stuart-cls stuart-cls changed the title Orange Table-specifc HDF5Reader Orange Table-specific HDF5Reader Apr 25, 2024
Orange/data/io.py Outdated Show resolved Hide resolved
@stuart-cls
Copy link
Author

I couldn't find a satisfactory solution to the .attributes problem (for the use case I care about) so I re-used the .metadata sidecar files for now, which is better than nothing.

Copy link

codecov bot commented May 20, 2024

Codecov Report

Attention: Patch coverage is 93.67089% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 88.21%. Comparing base (5ada6c4) to head (63c71b3).
Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6791   +/-   ##
=======================================
  Coverage   88.20%   88.21%           
=======================================
  Files         327      327           
  Lines       71223    71301   +78     
=======================================
+ Hits        62825    62900   +75     
- Misses       8398     8401    +3     

@markotoplak
Copy link
Member

Comments from @stuart-cls (from his email, just so that they do not get lost):

  • Table.attributes -> I could try again to properly store this in the HDF5, instead of the .metadata sidecar. I got stuck trying to get the round-trip on a visible image to work :)
  • Compatibility with dask branch: both opening files saved with that branch, and updating the branch to be compatible.

for subdomain in ['attributes', 'class_vars', 'metas']:
parsed = [parse(feature) for feature in getattr(data.domain, subdomain)]
domain = np.array([[name, header] for name, header, _ in parsed], 'S')
domain_args = np.array([json.dumps(args) for *_, args in parsed], 'S')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For domain_args we should certainly use the variable length h5py string data type. Probably for domain too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@stuart-cls
Copy link
Author

Regarding the previous comments:

Table.attributes -> I could try again to properly store this in the HDF5, instead of the .metadata sidecar. I got stuck trying to get the round-trip on a visible image to work :)

In general this format isn't doing anything clever with nested dictionaries (see domain_args for example). It would be a lot of work to map this to HDF5, and this is the same problem with Table.attributes.

Compatibility with dask branch: both opening files saved with that branch, and updating the branch to be compatible.

I've tested both ways, it works fine. The new reader checks for "Orange" in the "creator" attribute, but falls back to checking that the "domain" group is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants