Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stop_on_end = true closing underlying stream #178

Merged
merged 3 commits into from
Mar 16, 2024

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Mar 8, 2024

Fixes #85 #95 #117 #128
This PR ensures the underlying stream is not closed if stop_on_end = true is used.

This PR removes the done mode added in #177. With this PR when a stop_on_end = true stream reaches the end of a data chunk while writing, instead of going into stop or done mode the stream will restart the codec and concatenate the next data chunk, like it would if stop_on_end = false. After writing, stop_on_end = true only prevents the underlying stream from being closed.

These changes might be slightly breaking but only affect streams created with stop_on_end = true and in write mode.

@nhz2 nhz2 marked this pull request as ready for review March 8, 2024 17:05
@nhz2 nhz2 requested review from mkitti and jakobnissen March 8, 2024 17:06
@mkitti
Copy link
Member

mkitti commented Mar 8, 2024

Is "done" exposed at all via a public API or was that always internal state? We may want to check dependencies as well and possibly consider a major version change.

@nhz2
Copy link
Member Author

nhz2 commented Mar 8, 2024

The main change is the effect of stop_on_end = true when writing to a stream. Looking at https://juliahub.com/ui/Search?q=stop_on_end&type=code the only package that uses stop_on_end = true is @reallyasi9 's ZipStreams and it uses it in a read-only context.

EDIT: I reverted the below change to avoid affecting current uses of `TOKEN_END`.

The other change is what happens if two TranscodingStreams.TOKEN_END are written in a row. For example:

sink = IOBuffer();
stream = TranscodingStream(DoubleFrameEncoder(), sink);
write(stream, TranscodingStreams.TOKEN_END)
write(stream, TranscodingStreams.TOKEN_END)
flush(stream)
String(take!(sink))

In this PR this returns two concatenated empty data chunks "[ ][ ]" because each write of a TOKEN_END ends a data chunk, even if that data chunk is empty.

In the current version the code would return a single empty data chunk "[ ]".

Looking at packages that use TOKEN_END https://juliahub.com/ui/Search?q=TOKEN_END&type=code most of them write TOKEN_END before leaking or finalizing the stream.

From what I can tell the only place that could be affected by this change is https://github.com/BioJulia/FASTX.jl/blob/v2.1.4/src/fastq/writer.jl#L53 and FASTX.jl is one of the downstream tests already.

mkitti
mkitti previously approved these changes Mar 9, 2024
@nhz2
Copy link
Member Author

nhz2 commented Mar 10, 2024

I reverted the changes to the behavior of TOKEN_END to make this change less breaking for FASTX.jl

This makes it hard to make multiple empty data chunks in a row, but there aren't many uses for that functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Method for closing the transcoding stream without closing the underlying stream
2 participants