Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization to an already open IOStream #304

Open
fredrikekre opened this issue Apr 28, 2021 · 9 comments
Open

Serialization to an already open IOStream #304

fredrikekre opened this issue Apr 28, 2021 · 9 comments

Comments

@fredrikekre
Copy link
Contributor

Would it be possible to write to a user provided IO? I skimmed through the signatures of jldopen but I think all of them expects a string. Would make things like this (contrived) example possible:

open("f.jld2", "w") do io
    jldopen(io) do jld
        write(jld, ...)
    end
end
@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Apr 28, 2021

Hi @fredrikekre ,

in principle that might be possible.
You can even do a hacky version:

julia> using JLD2
[ Info: Precompiling JLD2 [033835bb-8acc-5ee8-8aae-3f567f8a3819]

julia> f = jldopen("dummy.jld2", "w"; iotype=IOStream)
JLDFile /home/jonas/dummy.jld2 (read/write)
  (no datasets)

julia> io = open("test.jld2", "w")
IOStream(<file test.jld2>)

julia> f.io = io
IOStream(<file test.jld2>)

julia> write(f, "test", rand(1000))

julia> close(f)

julia> load("test.jld2")
Dict{String, Any} with 1 entry:
  "test" => [0.809162, 0.731567, 0.969277, 0.701266, 0.950093, 0.626805, 0.228573, 0.447258, 0.679257, 0.515682  …  0…

This works, because JLD2 does not do anything to the io when just opening a new file.

More relevant: What do you want to achieve?

JLD2 and the underlying HDF5 standard are designed around the fact that they control their files. In particular, JLD2 will do
seek(io, 0) and write the header information and at seek(io, 512) some more meta data.

The HDF5-standard allows files to start at 0, 512, 1024, 2048, .... but only 512 is currently implemented for JLD2. (This could in principle be expanded).

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Apr 28, 2021

Just referencing stuff:
there was #233, #57
and I once experimented with JLSO support invenia/JLSO.jl#68
but I'm currently not convinced that that would be useful.

@fredrikekre
Copy link
Contributor Author

Thanks. I am not very familiar with the HDF5 spec so maybe this is not possible in general.

I tried some similar hack:

function JLD2.jldopen(io::IO, mode::String)
    @assert mode == "w"
    f = JLD2.JLDFile(io, "dummy", true, false, false, false)
    f.root_group = JLD2.Group{typeof(f)}(f)
    f.types_group = JLD2.Group{typeof(f)}(f)
    return f
end

which works but only(?) for io::IOStream.

I wanted to use it for something like

using JLD2, CodecZlib

open(GzipCompressorStream, "f.jld2.gz", "w") do io
    jldopen(io) do jld
        write(jld, ...)
    end
end

but such a stream is not seekable so will error on seek.

I guess it could be made to work if JLD2 allocated an in-memory seekable buffer and only actually wrote to the user specified IO when done (if that would be allowed by the spec).

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Apr 28, 2021

Ah, ok.
Just to make sure: Are you aware that JLD2 can already compress array fields?
(The new release from this week also added support for customizable compression)

Anything is allowed as long as you don't get caught..
What I'm trying to say: The file has to be correct in the end, that is what matters.
If you are going to fully write the file in memory anyway, then
you could probably write the file to your machine's tmp directory ?

@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Apr 28, 2021

which works but only(?) for io::IOStream

JLD2 internally dispatches a lot on MmapIO and IOStream.
I think, one could probably relax the restriction of IOStream to IO though.

@fredrikekre
Copy link
Contributor Author

fredrikekre commented Apr 28, 2021

Just to make sure: Are you aware that JLD2 can already compress array fields?

Yea, I just had a case where I store many non-isbits things (Vector{Vector{Float64}} in this case) and gained alot by compressing the full file instead. (Edit: This works, but looks like compression only kicks after a length threshold that I missed).

What I'm trying to say: The file has to be correct in the end, that is what matters.

Right. Does it seek(0) for every fieldwrite?

If you are going to fully write the file in memory anyway, then you could probably write the file to your machine's tmp directory ?

Yea, will probably do that.

which works but only(?) for io::IOStream

JLD2 internally dispatches a lot on MmapIO and IOStream.
I think, one could probably relax the restriction of IOStream to IO though.

Yea I think my hack will work as long as the io support seek.

Anyway, feel free to close this as a duplicate of the issues you linked above.

@oxinabox
Copy link

oxinabox commented Feb 8, 2022

I believe this is closed since #57 is closed

@JonasIsensee
Copy link
Collaborator

I believe this is closed since #57 is closed

I apologize for the confusion. I ended up closing #57 as a duplicate of this since this issue has more discussion...
The problem is not resolved.

@JonasIsensee
Copy link
Collaborator

#535 and therefore v0.5.5 make significant progress towards this.
You can now write to an IOBuffer

julia> iobuf = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> jldopen(iobuf, "w") do f
           f["a"] = 42
       end
42

julia> seek(iobuf, 0)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=785, maxsize=Inf, ptr=1, mark=-1)

julia> jldopen(iobuf) do f
       f["a"]
       end
42

Accepting an IOStream would also be possible. The only thing missing is a bit of API.

  • figure out what open-mode is appropriate for the file handle
  • Use the existing wrapper RWBuffer to keep track of potential global file offsets relative to the buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants