Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup the 3rd party support code #548

Open
eugenwintersberger opened this issue Dec 15, 2021 · 21 comments
Open

Cleanup the 3rd party support code #548

eugenwintersberger opened this issue Dec 15, 2021 · 21 comments
Assignees

Comments

@eugenwintersberger
Copy link
Collaborator

The current situation

  • all support for 3rd party libraries (including h5py or even the C++ standard library) lives in the source directories of the main library
  • some of the code is even included in header files required by the core library
  • the header files with the support data structures are included in the default hdf5.hpp file and thus always active without providing the user a chance to remove them

Suggested fixes

  • create a new hierarchy of directories below src/h5cpp/support
  • each supported library gets s subdirectory there. For instance: src/h5cpp/support/stl or src/h5cpp/support/h5py
  • the same structure should be provided for the tests below test/support
    Users who want to have support for STL data types can than easily add
#include <h5cpp/support/stl/vector.hpp>
#include <h5cpp/support/stl/string.hpp>

This would give the user a way to provide its own std::vector integration if he needs this.

How to get there

  • I would first start with moving the existing code and test the new structure
  • create a simple manual for developers how to integrate 3rd party support code

Once this is done (I hope during Christmas) it would be nice if others join in providing support for other 3rd party libraries.

@planetmarshall
Copy link
Contributor

Up to you as the maintainer but I might suggest contrib rather than support. contrib is a fairly common folder name for this kind of thing.

@eugenwintersberger
Copy link
Collaborator Author

@planetmarshall contrib is definitely a much better choice.

eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
eugenwintersberger added a commit that referenced this issue Dec 24, 2021
@eugenwintersberger
Copy link
Collaborator Author

Ok. For STL my approach seems to work. However, there is some tight coupling between STL strings, vectors and HDF5 attributes. This is something I have to think about but this should not stop us from moving on.
@jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.

@jkotan
Copy link
Collaborator

jkotan commented Dec 25, 2021

@jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.

Hi Eugen, the story is about NX_BOALEAN in NeXus.

At the beginning in the old libpniio versions to stores NX_BOOLEAN type variables DESY was used a predefined hdf5 integer type (like in C language). The problem is that when you read such a bool variable you cannot find if it is really boolean or integer one (unless you defined additional (optional) attribute type which discribes the type).

In NeXus format mapping to hdf5 format is not strictly defined i.e.

NX_BOOLEAN  true/false value ( true | 1 | false | 0 )

so it can be anything which can be mapped to true and false values.
Therefore, we started to search for another option to store NX_BOOLEAN types. We didn't want to reinvent the wheel so we decided to adopt h5py a convention which stored python bool type variables in hdf5 Enum with two values FALSE and TRUE. In our case it is EBool. Moreover, a lot of people use h5py to create nexus files even they don't know they use enums for store bool type variables.
Thus, our newer versions of libpniio as well as libpninexus library so that our newer detector servers and python-pninexus maps NeXus NX_BOOLEAN and bool c++/python types to the EBool enum.
So for DESY it is standard way of storing bool type variables.
But if you want to read back bool variables written by h5py you also need EBool. Otherwise you will read a variable to general enum type which you need to interpret by youself .

@eugenwintersberger
Copy link
Collaborator Author

@jkotan would it be ok for you if I move this to a contrib/nexus directory?

@jkotan
Copy link
Collaborator

jkotan commented Dec 26, 2021

@jkotan would it be ok for you if I move this to a contrib/nexus directory?

@eugenwintersberger , if you change only the header file location it should be OK

eugenwintersberger added a commit that referenced this issue Dec 27, 2021
eugenwintersberger added a commit that referenced this issue Dec 27, 2021
eugenwintersberger added a commit that referenced this issue Dec 27, 2021
@eugenwintersberger
Copy link
Collaborator Author

So far so good. I am not sure if I understand why some of the builds fail. Maybe @jkotan has a good idea ;)

@jkotan
Copy link
Collaborator

jkotan commented Dec 28, 2021

Hi @eugenwintersberger, It looks like you need to add

#include <h5cpp/contrib/stl/string.hpp>

to dataset.hpp or somewhere on into its includes because it have different read/write methods for strings. Also it is good to add

--- a/src/h5cpp/contrib/stl/array.hpp
+++ b/src/h5cpp/contrib/stl/array.hpp
@@ -1,6 +1,8 @@
 #pragma once
 
 #include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/dataspace/type_trait.hpp>
+#include <h5cpp/dataspace/simple.hpp>
 #include <array>
 
 namespace hdf5 { 

and

--- a/src/h5cpp/contrib/stl/string.hpp
+++ b/src/h5cpp/contrib/stl/string.hpp
@@ -1,6 +1,7 @@
 #pragma once
 
 #include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/datatype/string.hpp>
 #include <string>
 
 namespace hdf5 { 

and some other include statements to your test h5cpp/contrib is not in h5cpp/hdf5.hpp.
After that the code works on deb10 in the Release mode (for the debug mode for some compiler versions additional symbols are left in .so so you don't need to add includes statements )

For macos it is another independent issue, i.e. some conan packages were update and they started to require the newer conan version (I'm not a fan of package managers not based on distribution model). So we can either upgrade the conan version or downgrade the zlib package.

@jkotan
Copy link
Collaborator

jkotan commented Dec 30, 2021

Additionally after downgrading the zlib conan package to 1.2.8 for macos all the tests are green
https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/5/pipeline

@eugenwintersberger
Copy link
Collaborator Author

@jkotan thanks for your comments. It seems that I really should do more C++ to stay fit. What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?

@jkotan
Copy link
Collaborator

jkotan commented Dec 31, 2021

What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?

Good question 👍 It looks like a similar problem to https://stackoverflow.com/questions/42356633/why-would-a-release-build-have-linker-errors-if-the-debug-build-didnt i.e. I'm not sure but it looks like some code instantiates the the create and get template methods but since it is inline one it is optimized their symbols out.

@jkotan
Copy link
Collaborator

jkotan commented Jan 1, 2022

Playing with the code I have found that the instantiation of create and get template methods with string is done in dataset.cpp in the line

void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
{
  write(std::string(data),dtpl);
}

that if you comment the content of the above method, i.e. write(std::string(data),dtpl);, the linker errors disappear.
Thus, to be more precisely it would be good to move the include statement from dataset.hpp to dataset.cpp. Also I've noticed that there is missing an const implementation of the similar write method. Thus we could apply the following patch

--- a/src/h5cpp/node/dataset.cpp
+++ b/src/h5cpp/node/dataset.cpp
@@ -33,6 +33,7 @@
 #include <h5cpp/node/functions.hpp>
 #include <h5cpp/filter/external_filter.hpp>
 #include <h5cpp/error/error.hpp>
+#include <h5cpp/contrib/stl/string.hpp>
 
 namespace hdf5 {
 namespace node {
@@ -211,6 +212,11 @@ void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
   write(std::string(data),dtpl);
 }
 
+void Dataset::write(const char *data,const property::DatasetTransferList &dtpl) const
+{
+  write(std::string(data),dtpl);
+}
+
 filter::ExternalFilters Dataset::filters() const
 {
   filter::ExternalFilters efilters = filter::ExternalFilters();

Another option would be to move the implementation of the write method to the header but since it is not template method it is better not to do it.

@jkotan
Copy link
Collaborator

jkotan commented Jan 1, 2022

It looks like it is even more tricky. I've just run tests where I have moved the include statement from dataset.hpp to dataset.cpp and in that time the macos compiler has complained https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/6/pipeline so it is better to keep #include <h5cpp/contrib/stl/string.hpp> in dataset.hpp.

@eugenwintersberger
Copy link
Collaborator Author

I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?

@jkotan
Copy link
Collaborator

jkotan commented Jan 4, 2022

Hi @eugenwintersberger, I think you can merge them into the your PR. All the changes are in the issue_548_tests branch.

@jkotan
Copy link
Collaborator

jkotan commented Jan 5, 2022

I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?

Sure, make the PR. As I understand after merging the PR the user need to add #include statements with contrib headers or his/her own trait headers which he/she wants to use for specific c++ type/class.
It looks like the user only cannot replace his/her own std::string trait class version because #include <h5cpp/contrib/stl/string.hpp> is included in dataset.hpp

jkotan added a commit that referenced this issue Jan 19, 2022
* moved STL datatype traits to contrib/stl

Update #548

* Removed unused header files

Update #548

* attribute tests are back

Update #548

* node tests are back

update #548

* file and utility tests are back

Update #548

* datatype test is back

Update #548

* dataspace test ist back

Update #548

* moved dataspaces for STL containers to contrib

Update #548

* removed ebool code from the mainline

Update #548

* Added test build for the nexus contrib

Update #548

* Fixing ebool tests

Update #548

* add include statments

* downgrade conan version

* revert conanfile_ess.txt

* fix typo

* add CONAN_FILE for macos

* move include statement to dataset.cpp

* add back stl/string.hpp to dataset.hpp

* add newline at the end of files

* comment bzip2 from macos conanfile

* since @amues remove

* comment windows test

Co-authored-by: Jan Kotanski <[email protected]>
@eugenwintersberger
Copy link
Collaborator Author

Hi Jan. Long time no see. I see your problem. This is a bit tricky to resolve since there is this tight coupling between the dataset IO code and the string type. If I recall correctly this is due to the fact that, for some reason I forgot, we handle variable length string and fixed length sting IO different from general variable/fixed length data IO.
I was afraid that this technical debt we (most probably mainly I) introduced quite early in the project.

eugenwintersberger added a commit that referenced this issue Aug 8, 2022
Not running yet

Update #548
eugenwintersberger added a commit that referenced this issue Aug 8, 2022
eugenwintersberger added a commit that referenced this issue Aug 8, 2022
@eugenwintersberger
Copy link
Collaborator Author

Ok. It seems that the string trait was not the problem. I removed now all the contrib header files from the core library. I think it is if people use your master header file h5cpp.hpp everything should work normal. The only problem that remains is, that the code does not build on OSX (see branch issue_548.

@jkotan
Copy link
Collaborator

jkotan commented Aug 9, 2022

Hi Eugen,

h5cpp.hpp contrary to hdf5.hpp contains stl, i.e.

#include <h5cpp/hdf5.hpp>
#include <h5cpp/contrib/stl/stl.hpp>

@eugenwintersberger
Copy link
Collaborator Author

I guess this is a reasonable approach to provide people with a reasonable way to start so they don't have to think about adding another header file. I will try to fix the clang problem today (without any success promises ;))

eugenwintersberger added a commit that referenced this issue Aug 9, 2022
eugenwintersberger added a commit that referenced this issue Aug 9, 2022
Update Cleanup the 3rd party support code #548
eugenwintersberger added a commit that referenced this issue Aug 9, 2022
eugenwintersberger added a commit that referenced this issue Aug 9, 2022
eugenwintersberger added a commit that referenced this issue Aug 9, 2022
eugenwintersberger added a commit that referenced this issue Aug 9, 2022
@eugenwintersberger
Copy link
Collaborator Author

Tests are passing except for Windows. I have no Windows installation here so I will deal with this next week when I am back at home.

eugenwintersberger added a commit that referenced this issue Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants