Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for h5read #637

Open
mtfishman opened this issue Apr 28, 2021 · 5 comments
Open

Support for h5read #637

mtfishman opened this issue Apr 28, 2021 · 5 comments

Comments

@mtfishman
Copy link
Member

@emstoudenmire probably you have investigated this but I was interested in the current state of h5read when looking over the docs you moved over here: #634.

Unfortunately, the more compact notation for reading using h5read doesn't work for some (maybe all?) of our types:

julia> using ITensors, ITensors.HDF5

julia> i = Index(2)
(dim=2|id=826)

julia> h5read("myfile.h5", "i" => Index)
ERROR: MethodError: no method matching h5read(::String, ::Pair{String, UnionAll})
Closest candidates are:
  h5read(::Any, ::AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:487
  h5read(::Any, ::Pair{var"#s53", DataType} where var"#s53"<:AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:500
  h5read(::Any, ::AbstractString, ::Tuple{Vararg{Union{Colon, Int64, AbstractRange{Int64}}, N} where N}; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:513
Stacktrace:
 [1] top-level scope
   @ REPL[18]:1

julia> h5read("myfile.h5", "i" => Index{Int})
ERROR: MethodError: no method matching read(::HDF5.Group, ::Type{Index{Int64}})
Closest candidates are:
  read(::Union{HDF5.Attribute, HDF5.Dataset}, ::Type{T}, ::Any...) where T at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1193
  read(::Union{HDF5.File, HDF5.Group}, ::Pair{var"#s53", DataType} where var"#s53"<:AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1165
  read(::Union{HDF5.File, HDF5.Group}, ::AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1158
  ...
Stacktrace:
 [1] h5read(filename::String, name_type_pair::Pair{String, DataType}; pv::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{0 Indices
, Tuple{}}})
   @ HDF5 ~/.julia/packages/HDF5/iH4LA/src/HDF5.jl:505
 [2] h5read(filename::String, name_type_pair::Pair{String, DataType})
   @ HDF5 ~/.julia/packages/HDF5/iH4LA/src/HDF5.jl:501
 [3] top-level scope
   @ REPL[19]:1

There are two issues going on here. In the first issue, I believe it is more of a limitation on the HDF5.jl side, where Index is technically an abstract type Index{T} where {T} with typeof(Index) === UnionAll, while h5read only accepts a concrete type like Index{Int} with typeof(Index{Int}) === DataType. In the second issue, the problem is more on our side, since we don't allow specifying a concrete type in this case, so we could expand the definition here to accept a more general Type{<: Index}. However, this fix is still inconvenient, since we don't want people having to specify the particular concrete Index{Int} type.

The version without the explicit type runs:

julia> h5read("myfile.h5", "i")
Dict{String, Any} with 5 entries:
  "dim"  => 2
  "id"   => 0x1432306146b0bb12
  "plev" => 0
  "tags" => Dict{String, Any}("tags"=>"")
  "dir"  => 0

but of course doesn't output an Index, just some default data structure. It is difficult to find information about how to make this work for custom types, but my understanding from this comment is that you may need to define a custom file format. This makes a certain amount of sense, since then the file format can be determined and dispatched on, and without that there doesn't seem to be a way to tell HDF5.jl to do some special parsing of the files to search for ITensor-specific type information. @emstoudenmire, have you investigated this? That seems like the strategy the packages like JLD.jl take for saving and loading custom types.

@emstoudenmire
Copy link
Collaborator

Good catch that this doesn't work. I had in fact been wanting to move to a system where you don't have to specify the type at all, but instead when we write types to HDF5 it includes a string saying what the type is. This string would be eval'd when we read the data back in to generate an actual Julia type, at which point we call the type-specific version of h5read internally.

But you've definitely hit upon an issue that I was coming across, namely that we might want to write data of various closely related types such as Index{T} for various values of T. But then in a lot of cases it's not a priori clear how specific of a type we should put, and whether it should be a common abstract type like Index, or the exact type etc. Probably it should be the "exact enough" type, meaning enough to represent the data stored. So it's something we should discuss.

@mtfishman
Copy link
Member Author

Yeah, in that case it seems like it could be a minimal amount of type information to reconstruct the original object. In general it isn't necessary to store Index{Int} since that can be deduced from what is stored in the file, just like the constructor Index(2) deduces it should be Index{Int}. A principle to use would be whatever minimal constructor call you would need in order to create the same object (i.e. Dense([1.0,2.0,3.0]) works so there is no need to store Dense{Float64} or something more complicated).

The issues I was more focusing on were:

  1. Can we directly use i = h5read("myfile.h5", "i" => Index) or some variation instead of the longer notation f = h5open("myfile.h5", "r"); i = read(f,"i", Index); close(f)? It seems like the answer was no, not the way HDF5.jl is currently written. In principle we could overload HDF5.h5read to make i = h5read("myfile.h5", "i", Index) work but I don't see an equivalent version of that in HDF5.jl so it might be a bit sketchy to do that.
  2. If we want to specifically overload the HDF5.jl function HDF5.h5read("myfile.h5", "i"), I don't see how we can get that to deduce the type for us from a string inside the file (using an eval as you say) since it doesn't seem like there is a way to overload that behavior the way HDF5.jl is written currently. So it seems to either require making a new function (say ITensors.h5read or whatever unique name) or inventing a new ITensor-specific file format like h5it and use h5read("myfile.h5it", "i").

Also see the discussion here, a safer alternative to using eval would be to use a dictionary like Dict("ITensor" => ITensor, "Index" => Index, "Dense" => Dense, [...]). Presumably there are a finite number of types we will support for reading and writing which could get "registered" in this dictionary.

@emstoudenmire
Copy link
Collaborator

Glad you agree, and a related thought about which type to store is that the more abstract or general information that we put into the file, the more "forward compatible" or "future proof" it will be. So like e.g. if we decide to change some detail about whether the QN's are stored in a tuple or a vector (just making this up) and if that reflects in the type, not to have that break all previously written HDF5 files. But even if we do break compatibility, I had been thinking of and already implementing a versioning system where we put a version number and keep older reading-in code around if we detect older versions. It won't work perfectly but may allow some smooth upgrading across many versions.

To your point #2, you're right we couldn't really change the behavior of h5read("myfile.h5","i") without replacing or wrapping around the implementation in HDF5.jl, which would be a bad idea anyway. So probably the way to go there is a different function name that is introduced by ITensor, but also fairly general in its name. Something like h5get but ideally a better name than that.

@hz-xiaxz
Copy link

Don't know if this issue is still a concern now. Just to share some ideas. Have you checked JLD2.jl? It's quite stable now and able to store Julia struct, while compatible for HDF5 format. (Also a basic visualization support is currently provided by the vscode H5web extension)

If there's too large a work to migrate or maintain, just forget what I've said. :)

@emstoudenmire
Copy link
Collaborator

We have considered JLD2, and in our own usage have found it works well. One nice thing about it, you may know, is that it doesn't require us to add explicit support for many, maybe most, of our types since it is able to figure out automatically how to write them to disk. So users are welcome and encouraged to try JLD2 for their data including data from ITensor.

Meanwhile, we do still plan to also support HDF5 since it's usable across multiple languages, more standard, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants