Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should OpenPIV use pandas.DataFrame or xarray.DataArray or Dataset internally ? #274

Open
alexlib opened this issue Nov 24, 2022 · 3 comments
Labels
good first issue Good issue to start when you want to learn how to contribute to this open source project

Comments

@alexlib
Copy link
Member

alexlib commented Nov 24, 2022

We use rather simple numpy arrays for x,y,u,v,s2n,flags, along the processing, and at the end we write these as columns in the ASCII files with some optional headers.
In the multiprocessing we also have several iterations and then we have temporary arrays and then refined arrays, filtered and flagged arrays, interpolated arrays, etc.

I think it makes more sense to use some modern structures, e.g. dataclass or pandas.DataFrame, xarray.DataArray and similar ones that will open several options for us:

  • easier data maintenance, e.g. temporal storage,
  • adding new columns with headers, keeping units, experimental metadata as attributes to those arrays
  • saving to a large set of file formats while keeping the structure of reading/writing intact - managed by the pandas or xarray or the libraries their use
  • allow better memory management by adding, replacing columns, renaming only units, etc.

This can become a part of the OpenPIV v1.0 milestone

@ErichZimmer
Copy link
Contributor

I agree with using pandas or xarray for a modern data structure. These come with a nice side effect of storing the structures in HDF4/5 and NetCDF.

@alexlib alexlib added the good first issue Good issue to start when you want to learn how to contribute to this open source project label Feb 24, 2023
@ErichZimmer
Copy link
Contributor

@alexlib I plan on refactoring OpenPIV to use xarray. The structure would be similar to that of the c++ version where the folder tree would be something along the lines of this:

/openpiv 
  __init__.py
  /core
    __init__.py
    preprocess.py
    process.py
    tools.py
    windef.py
    ...
  /piv_engine <--- contains classes wrapped around xarray 
    __init__.py
   piv.py
   vector.py
  /legacy <--- legacy interface for compatibility reasons 

The classes in the PIV engine submodule would hold the raw results from PIV along with a class that would perform the usual recursive PIV algorithms. I am thinking of making the parameters for PIV a class too since we could implement a function to read and write YAML files. This should allow for a user to use the c++, python, or GPU version of OpenPIV once this feature is completed. Note, this is a proof of concept for now as I am doing paper sketches until I get access to a computer. I would be implementing a similar solution for the c++ version of OpenPIV once I complete the port for the meson build system. I have no plans for the Matlab version of OpenPIV.

@alexlib
Copy link
Member Author

alexlib commented Jan 20, 2025

Great plan
please take a look also at ffpiv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good issue to start when you want to learn how to contribute to this open source project
Projects
None yet
Development

No branches or pull requests

2 participants