-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add zstd support #9
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some minor nits on the unit tests. Thank-you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to bring in @tfenne for final review as I’m novice at rust.
src/io/mod.rs
Outdated
pub fn new_writer<P>(&self, p: &P) -> Result<BufWriter<Box<dyn Write + Send>>> | ||
where | ||
P: AsRef<Path>, | ||
{ | ||
let file = File::create(p).map_err(FgError::IoError)?; | ||
let write: Box<dyn Write + Send> = if Io::is_gzip_path(p) { | ||
Box::new(GzEncoder::new(file, self.compression)) | ||
} else if Io::is_zstd_path(p) { | ||
Box::new(ZstdEncoder::new(file, 0).unwrap().auto_finish()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the unwrap()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about zstd compression, but I'm not sure just specifying 0
here (which appears to then use the default, level 3, compression) is the right thing to do, as it gives the user zero control.
On the other hand, re-using the compression
passed in above is problematic as gzip and zstd have very different compression level ranges (0-9 vs. -17 to 22), and introducing a new parameter to Io::new()
for zstd_level would be a breaking change.
Thoughts @nh13?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see having a member per level, but then what about bzip2 and future compression levels? And perhaps we have compression that requires not just a level? And certainly we don't want to add translation layer that goes from 0-10 levels for each compression type.
What about instead of storing compression
as a member of Io
, we have a Map/Vec that stores a compression-specific struct (can impl a trait). You could even make is_gzip_path
specific to the compression as a trait-level method (is_path_for
), so a new compression need only impl the base trait to be added. Then we avoid the if/else if/else chain there and it can just be a loop (with a default if it exits the loop)? Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The Io
struct probably will have to change at some point if further compression schemes are to be supported as far as I can see. #10 might also be a good solution to this. Haven't checked it out thoroughly yet, but it does feel like good idea to actually check the file format instead of just the extension, although might still want to have a way to compare the found format vs. extension to detect any inconsistencies.
An attempt at adding zstd support in a similar fashion to how gzip is handled.
Once (if) approved, would be used for: fulcrumgenomics/fqgrep#20 in a subsequent PR to the relevant repo.