Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add zstd support #9

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Feat: Add zstd support #9

wants to merge 6 commits into from

Conversation

kockan
Copy link

@kockan kockan commented Aug 16, 2023

An attempt at adding zstd support in a similar fashion to how gzip is handled.

Once (if) approved, would be used for: fulcrumgenomics/fqgrep#20 in a subsequent PR to the relevant repo.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some minor nits on the unit tests. Thank-you!

src/io/mod.rs Outdated Show resolved Hide resolved
src/io/mod.rs Show resolved Hide resolved
src/io/mod.rs Show resolved Hide resolved
@kockan kockan requested a review from nh13 August 17, 2023 02:52
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to bring in @tfenne for final review as I’m novice at rust.

@nh13 nh13 requested a review from tfenne August 17, 2023 15:09
src/io/mod.rs Outdated Show resolved Hide resolved
src/io/mod.rs Outdated
pub fn new_writer<P>(&self, p: &P) -> Result<BufWriter<Box<dyn Write + Send>>>
where
P: AsRef<Path>,
{
let file = File::create(p).map_err(FgError::IoError)?;
let write: Box<dyn Write + Send> = if Io::is_gzip_path(p) {
Box::new(GzEncoder::new(file, self.compression))
} else if Io::is_zstd_path(p) {
Box::new(ZstdEncoder::new(file, 0).unwrap().auto_finish())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the unwrap().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about zstd compression, but I'm not sure just specifying 0 here (which appears to then use the default, level 3, compression) is the right thing to do, as it gives the user zero control.

On the other hand, re-using the compression passed in above is problematic as gzip and zstd have very different compression level ranges (0-9 vs. -17 to 22), and introducing a new parameter to Io::new() for zstd_level would be a breaking change.

Thoughts @nh13?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

Screenshot 2023-08-18 at 12 26 25 PM

Not sure if it's the proper way to go about it but it's one possibility if it's preferred that the Io struct absolutely doesn't change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see having a member per level, but then what about bzip2 and future compression levels? And perhaps we have compression that requires not just a level? And certainly we don't want to add translation layer that goes from 0-10 levels for each compression type.

What about instead of storing compression as a member of Io, we have a Map/Vec that stores a compression-specific struct (can impl a trait). You could even make is_gzip_path specific to the compression as a trait-level method (is_path_for), so a new compression need only impl the base trait to be added. Then we avoid the if/else if/else chain there and it can just be a loop (with a default if it exits the loop)? Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The Io struct probably will have to change at some point if further compression schemes are to be supported as far as I can see. #10 might also be a good solution to this. Haven't checked it out thoroughly yet, but it does feel like good idea to actually check the file format instead of just the extension, although might still want to have a way to compare the found format vs. extension to detect any inconsistencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants