Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#132 allow users to optionally provide an output buffer when calling transcode #136

Merged
merged 26 commits into from
Apr 9, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions src/buffer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@ mutable struct Buffer
# the total number of transcoded bytes
transcoded::Int64

function Buffer(size::Integer)
return new(Vector{UInt8}(undef, size), 0, 1, 1, 0)
function Buffer(data::Vector{UInt8}, keepbytes::Integer=length(data))
baumgold marked this conversation as resolved.
Show resolved Hide resolved
return new(data, 0, 1, keepbytes+1, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that we now have a single inner constructor.

Could we do a sanity check here to see if keepbytes is not greater than the length of data and also nonnegative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

end
end

function Buffer(data::Vector{UInt8})
return new(data, 0, 1, length(data)+1, 0)
end
function Buffer(size::Integer = 0)
return Buffer(Vector{UInt8}(undef, size), 0)
end

function Buffer(data::Base.CodeUnits{UInt8})
return Buffer(Vector{UInt8}(data))
function Buffer(data::Base.CodeUnits{UInt8}, keepbytes::Integer=length(data))
return Buffer(Vector{UInt8}(data), keepbytes)
end

function Base.length(buf::Buffer)
Expand Down Expand Up @@ -199,6 +199,11 @@ function copydata!(buf::Buffer, data::Ptr{UInt8}, nbytes::Integer)
return buf
end

# Copy data from `data` to `buf`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a docstring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want this? All these things are internal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, it's an internal thing. If you're a user, you don't care about internals. If you care about internals you might as well read the code comments.

That said, if you feel strongly about this I can make them docstrings :-)

function copydata!(buf::Buffer, data::Buffer, nbytes::Integer = length(data))
return copydata!(buf, bufferptr(data), nbytes)
end

# Copy data from `buf` to `data`.
function copydata!(data::Ptr{UInt8}, buf::Buffer, nbytes::Integer)
# NOTE: It's caller's responsibility to ensure that the buffer has at least
Expand Down
7 changes: 3 additions & 4 deletions src/noop.jl
Original file line number Diff line number Diff line change
Expand Up @@ -120,10 +120,9 @@ function Base.transcode(::Type{Noop}, data::ByteData)
return transcode(Noop(), data)
end

function Base.transcode(::Noop, data::ByteData)
# Copy data because the caller may expect the return object is not the same
# as from the input.
return Vector{UInt8}(data)
function Base.transcode(codec::Noop, input::Buffer, output::Buffer = Buffer())
copydata!(output, input)
return output.data
end


Expand Down
42 changes: 34 additions & 8 deletions src/transcode.jl
baumgold marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@
# =========

"""
transcode(::Type{C}, data::Vector{UInt8})::Vector{UInt8} where C<:Codec
transcode(
::Type{C},
data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},
)::Union{Vector{UInt8},Base.CodeUnits{UInt8}} where {C<:Codec}

Transcode `data` by applying a codec `C()`.

Expand All @@ -27,21 +30,34 @@ julia> String(decompressed)

```
"""
function Base.transcode(::Type{C}, data::ByteData) where C<:Codec
function Base.transcode(::Type{C}, args...) where {C<:Codec}
codec = C()
initialize(codec)
try
return transcode(codec, data)
return transcode(codec, args...)
finally
finalize(codec)
end
end

_default_output_buffer(codec, input) = Buffer(
initial_output_size(
codec,
buffermem(input)
)
)

"""
transcode(codec::Codec, data::Vector{UInt8})::Vector{UInt8}
transcode(
codec::Codec,
data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},
[output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],
)::Union{Vector{UInt8},Base.CodeUnits{UInt8}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature line implies we can mix and match Vector{UInt8} and Base.CodeUnits{UInt8}. Is that true? Do we want that?

Can we document what the output type will be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to forbid mix and match types. It's more flexibility for users at no performance cost. I'm not against enforcing the same type either, you tell me.

Regarding the return value, I believe it's always Vector{UInt8}, right? Because all these things are wrapped in Buffer and then returned as buffer.data here [1]. I've documented this.

[1]

return output.data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this done, let me know.


Transcode `data` by applying `codec`.

If `output` is unspecified, then this method will allocate it.

Note that this method does not initialize or finalize `codec`. This is
efficient when you transcode a number of pieces of data, but you need to call
[`TranscodingStreams.initialize`](@ref) and
Expand All @@ -59,7 +75,9 @@ julia> codec = ZlibCompressor();

julia> TranscodingStreams.initialize(codec)

julia> compressed = transcode(codec, data);
julia> compressed = Vector{UInt8}()

julia> transcode(codec, data, compressed);

julia> TranscodingStreams.finalize(codec)

Expand All @@ -76,9 +94,11 @@ julia> String(decompressed)

```
"""
baumgold marked this conversation as resolved.
Show resolved Hide resolved
function Base.transcode(codec::Codec, data::ByteData)
input = Buffer(data)
output = Buffer(initial_output_size(codec, buffermem(input)))
function Base.transcode(
baumgold marked this conversation as resolved.
Show resolved Hide resolved
codec::Codec,
input::Buffer,
output::Buffer = _default_output_buffer(codec, input),
)
baumgold marked this conversation as resolved.
Show resolved Hide resolved
error = Error()
code = startproc(codec, :write, error)
if code === :error
Expand Down Expand Up @@ -121,6 +141,12 @@ function Base.transcode(codec::Codec, data::ByteData)
throw(error[])
end

Base.transcode(codec::Codec, data::Buffer, output::ByteData) =
transcode(codec, data, Buffer(output))

Base.transcode(codec::Codec, data::ByteData, args...) =
transcode(codec, Buffer(data), args...)

# Return the initial output buffer size.
function initial_output_size(codec::Codec, input::Memory)
return max(
Expand Down
17 changes: 17 additions & 0 deletions test/codecnoop.jl
Original file line number Diff line number Diff line change
Expand Up @@ -192,9 +192,26 @@
data = b""
@test transcode(Noop(), data) == data
@test transcode(Noop(), data) !== data
@test transcode(Noop(), data, Vector{UInt8}()) == data
@test transcode(Noop(), data, TranscodingStreams.Buffer(Vector{UInt8}())) == data
@test transcode(Noop(), data, Vector{UInt8}()) !== data
@test transcode(Noop(), data, TranscodingStreams.Buffer(Vector{UInt8}())) !== data
output = Vector{UInt8}()
@test transcode(Noop(), data, output) === output
output = TranscodingStreams.Buffer(Vector{UInt8}())
@test transcode(Noop(), data, output) === output.data

data = b"foo"
@test transcode(Noop(), data) == data
@test transcode(Noop(), data) !== data
@test transcode(Noop(), data, Vector{UInt8}()) == data
@test transcode(Noop(), data, TranscodingStreams.Buffer(Vector{UInt8}())) == data
@test transcode(Noop(), data, Vector{UInt8}()) !== data
@test transcode(Noop(), data, TranscodingStreams.Buffer(Vector{UInt8}())) !== data
output = Vector{UInt8}()
@test transcode(Noop(), data, output) === output
output = TranscodingStreams.Buffer(Vector{UInt8}())
@test transcode(Noop(), data, output) === output.data

TranscodingStreams.test_roundtrip_transcode(Noop, Noop)
TranscodingStreams.test_roundtrip_read(NoopStream, NoopStream)
Expand Down