Streaming get with prefetch and Julia IO Stream API #30

andrebsguedes · 2024-02-07T22:23:14Z

No description provided.

… for reading get_object_stream

src/RustyObjectStore.jl

jfunstonRAI · 2024-02-26T18:41:16Z

src/RustyObjectStore.jl

+
+Opaque IO stream of object data.
+
+It is necessary to `finish!` the stream if it is not run to completion.


It is not necessary to call finish! if it is run to completion? I don't see readbytes! calling finish or destroying the stream, is that correct? Also, is it more idiomatic to say that close should be called?

Good catch! On a refactor _unsafe_read lost the destroy_read_stream calls. All other methods rely on it to properly destroy the stream if EOF is reached. Will change to the docs to Base.close.

And to clarify, the docs statement is correct, we should only need to close if we don't run it to completion

This makes me remember we discussed adding a finalizer to ensure proper reclamation, let's see if we will have the time for them

…ssing destroy_read_stream

Drvi

Sorry for the late review! Left a couple of comments / suggestions which will hopefully be useful. Thanks for working on this!

Drvi · 2024-03-01T10:40:18Z

src/RustyObjectStore.jl

+            bytes_read + bytes_to_read > length(dest) && resize!(dest, bytes_read + bytes_to_read)
+            bytes_read += GC.@preserve dest _unsafe_read(io, pointer(dest, bytes_read+1), bytes_to_read)
+        end
+        resize!(dest, bytes_read)


We should guard against resize!(dest, bytes_read) shrinking the dest array, as that is the expected behavior of readbytes!.

readbytes!(stream::IO, b::AbstractVector{UInt8}, nb=length(b)) Read at most nb bytes from stream into b, returning the number of bytes read. The size of b will be increased if needed (i.e. if nb is greater than length(b) and enough bytes could be read), but it will never be decreased.

Drvi · 2024-03-01T10:51:25Z

src/RustyObjectStore.jl

+        while !eof(io)
+            bytes_to_read = 128 * 1024
+            bytes_read + bytes_to_read > length(dest) && resize!(dest, bytes_read + bytes_to_read)
+            bytes_read += GC.@preserve dest _unsafe_read(io, pointer(dest, bytes_read+1), bytes_to_read)
+        end


Could we utilize the io.object_size instead of sniffing each 128KiB?

Drvi · 2024-03-01T10:52:17Z

src/RustyObjectStore.jl

+        resize!(dest, bytes_read)
+        return bytes_read
+    else
+        bytes_to_read = n == typemax(Int) ? 64 * 1024 : Int(n)


In this branch n != typemax(Int), also I think this special handling of typemax(Int) should be documented

Drvi · 2024-03-01T10:55:02Z

src/RustyObjectStore.jl

+    end
+
+    response_ref = Ref(ReadResponseFFI())
+    cond = Base.AsyncCondition()


Should we store the cond in the stream object so we don't have to allocate it on every _unsafe_read, eof, etc? Similarly for WriteStream.

Drvi · 2024-03-01T11:47:31Z

src/RustyObjectStore.jl

+    buf = zeros(UInt8, 1)
+    n = _unsafe_read(io, pointer(buf), 1)
+    n < 1 && throw(EOFError())
+    @inbounds b = buf[1]
+    return b


I think we can avoid the unfortunate allocation of the array

eof(io) && throw(EOFError()) ref = Ref{UInt8}() n = GC.@preserve ref _unsafe_read(io, Base.unsafe_convert(Ptr{UInt8}, ref), 1) n < 1 && throw(EOFError()) return ref[]

Drvi · 2024-03-01T11:49:43Z

src/RustyObjectStore.jl

+        bytes_read = readbytes!(from, buf, 64 * 1024)
+        bytes_written = 0
+        while bytes_written < bytes_read
+            bytes_written += write(to, buf[bytes_written+1:bytes_read])


buf[bytes_written+1:bytes_read]

this creates a copy of the slice, consider using a @view or unsafe_write and advancing the pointer manually.

Drvi · 2024-03-01T12:07:10Z

src/RustyObjectStore.jl

+# Throws
+- `GetException`: If the request fails for any reason.
+"""
+function get_object_stream(path::String, conf::AbstractConfig; size_hint::Int=0, decompress::String="")


Could we call this ReadStream instead of get_object_stream?

Drvi · 2024-03-01T12:07:45Z

src/RustyObjectStore.jl

+# Throws
+- `PutException`: If the request fails for any reason.
+"""
+function put_object_stream(path::String, conf::AbstractConfig; compress::String="")


Can we call this WriteStream instead of put_object_stream?

andrebsguedes added 11 commits February 5, 2024 18:46

Draft streaming get

d101657

Uses read_get_stream interface instead of exposing the buffers to Julia

c0bbbf5

Adds exports

326e3ff

Adds put_object_stream and removes dependency on accurate object size…

784532e

… for reading get_object_stream

Allows for anonymous reads on S3

e772d13

Improves error information

204f651

Adds ParseURLError reason

3815c5f

Enables anonymous reads for Azure (plus error reason fix)

ab64504

Adds missing type

7f607f3

Accounts for DNS related connection errors

2b4e70f

Remove parameter that did not go through the object_store merge

6b7f502

jfunstonRAI reviewed Feb 26, 2024

View reviewed changes

andrebsguedes added 2 commits February 26, 2024 19:42

Handle early (before wait) response errors

2cb3121

Always wait on AsyncCondition and check for response errors, fixes mi…

8f3b1a8

…ssing destroy_read_stream

jfunstonRAI approved these changes Feb 27, 2024

View reviewed changes

Bump object_store_ffi_jll to 0.5.0

53255c8

andrebsguedes merged commit 5375436 into main Feb 28, 2024

andrebsguedes mentioned this pull request Feb 28, 2024

Unable to perform anonymous reads when using no credentials with AWS S3 #22

Closed

Drvi reviewed Mar 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming get with prefetch and Julia IO Stream API #30

Streaming get with prefetch and Julia IO Stream API #30

Uh oh!

andrebsguedes commented Feb 7, 2024

Uh oh!

Uh oh!

jfunstonRAI Feb 26, 2024

Uh oh!

andrebsguedes Feb 27, 2024

Uh oh!

andrebsguedes Feb 27, 2024

Uh oh!

andrebsguedes Feb 27, 2024

Uh oh!

Drvi left a comment •

edited

Loading

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Drvi Mar 1, 2024

Uh oh!

Uh oh!


		Opaque IO stream of object data.

		It is necessary to `finish!` the stream if it is not run to completion.

Streaming get with prefetch and Julia IO Stream API #30

Streaming get with prefetch and Julia IO Stream API #30

Uh oh!

Conversation

andrebsguedes commented Feb 7, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Drvi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Drvi left a comment •

edited

Loading