DRAFT: SortFastq #38

kockan · 2023-08-21T21:07:04Z

Initial attempt at #3
Tested on an empty FASTQ as well as a ~10GB one. It seems to be working.

max-records	fgbio	fqtk
500000	186.07s user 19.88s system 130% cpu 2:37.46 total	221.10s user 27.20s system 82% cpu 4:59.44 total
1000000	194.21s user 21.99s system 114% cpu 3:08.19 total	226.42s user 26.46s system 84% cpu 4:57.96 total
2000000	196.37s user 13.59s system 170% cpu 2:02.82 total	228.77s user 24.76s system 89% cpu 4:44.37 total
4000000	212.98s user 13.57s system 202% cpu 1:52.13 total	230.88s user 20.35s system 119% cpu 3:29.84 total

Looks like fgbio is consistently faster. Should look into why. Library/sorting algorithm/my usage?

Should add unit tests at minimum + any feedback if dev interest in merging.

nh13 · 2025-02-13T06:49:47Z

src/bin/commands/sort_fastq.rs

+impl Command for SortFastq {
+    /// Executes the sort_fastq command
+    fn execute(&self) -> Result<()> {
+        let mut fq_reader = FastqReader::from_path(&self.input)?;


You probably want to use FastqReader::with_capacity(1024 * 1024) or something large like we do in the demux tool

nh13 · 2025-02-13T06:53:02Z

src/bin/commands/sort_fastq.rs

+            LimitedBufferBuilder,
+        > = ExternalSorterBuilder::new()
+            .with_tmp_dir(Path::new("./"))
+            .with_buffer(LimitedBufferBuilder::new(self.max_records, true))


any reason not to expose the tmp dir on the command line, and have it by default be None (use the system default)?

I would think a big win for this tool would be to use with_threads_number. Shall we use multiple threads?

I would also try to use with_rw_buf_size with 1024 * 1024

Initial code for sort-fastq

713cd16

nh13 requested changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: SortFastq #38

DRAFT: SortFastq #38

kockan commented Aug 21, 2023

nh13 Feb 13, 2025

nh13 Feb 13, 2025

DRAFT: SortFastq #38

Are you sure you want to change the base?

DRAFT: SortFastq #38

Conversation

kockan commented Aug 21, 2023

nh13 Feb 13, 2025

Choose a reason for hiding this comment

nh13 Feb 13, 2025

Choose a reason for hiding this comment