You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As implemented in JuliaLang/julia#22441. At the moment, this is impossible, because Julia does not correctly construct these objects. But once it does, there is the question of how these should be written in the file. Obvious options are:
Write union-typed fields the same way we do now, by saving the field content in its own dataset. This might be easier for non-JLD2 implementations to consume, since it's a closer match with the HDF5 data model, but writing a dataset has substantially overhead in terms of both space and time vs. how Julia handles this in memory. It would be nice if the cost of saving/loading data with JLD2 was closely related to the cost of working with said data in Julia.
Create datatypes for isbits unions. HDF5 doesn't actually support unions, but we could save a datatype structured like:
where only one of the fields will be initialized, and the ty field will say which one. In principle this wastes some space in the file since only the Int64 or Float64 field will contain data, but the storage overhead is smaller than the storage overhead from creating a new dataset, and in the likely common case of Union{T,Null}, we would just be storing an extra byte to signal whether the value was null or not.
This would be a breaking change in that older versions of JLD2 wouldn't be able to read files created with newer versions, although newer versions of JLD2 should still be able to handle files created with older versions.
The text was updated successfully, but these errors were encountered:
Is it the same thing that causes Vector{Union{Float64, Missing}} not to be compressed (Julia 0.6.1, JLD2 0.0.4)?
I guess the check that disables compression is elseif f.compress && isleaftype(T) && isbits(T) in write_dataset()
For my case, serialized gzipped is ~8Mb and JLD2 with compress=true is ~75Mb.
As implemented in JuliaLang/julia#22441. At the moment, this is impossible, because Julia does not correctly construct these objects. But once it does, there is the question of how these should be written in the file. Obvious options are:
where only one of the fields will be initialized, and the
ty
field will say which one. In principle this wastes some space in the file since only theInt64
orFloat64
field will contain data, but the storage overhead is smaller than the storage overhead from creating a new dataset, and in the likely common case ofUnion{T,Null}
, we would just be storing an extra byte to signal whether the value was null or not.This would be a breaking change in that older versions of JLD2 wouldn't be able to read files created with newer versions, although newer versions of JLD2 should still be able to handle files created with older versions.
The text was updated successfully, but these errors were encountered: