-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of a SAMRecord comparator that matches samtools' queryname sort order. #1600
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1600 +/- ##
===============================================
+ Coverage 69.838% 69.852% +0.014%
- Complexity 9653 9664 +11
===============================================
Files 703 704 +1
Lines 37647 37681 +34
Branches 6114 6121 +7
===============================================
+ Hits 26292 26321 +29
- Misses 8904 8906 +2
- Partials 2451 2454 +3
|
Always best to be I second place: #1601 |
Do you need to fall back on the first and second of pair flags as samtools does? https://github.com/samtools/samtools/blob/bdc5bb81c11f2ab25ea97351213cb87b33857c4d/bam_sort.c#L1798 |
@nh13 I guess it depends on whether you want to match samtools 100% or implement the "natural" sort order which doesn't prescribe how ties are broken. I don't think I have a strong opinion so long as e.g. first of pair sorts before second of pair, and secondary/supplementary records sort after primary |
I think the ideal way to make this work, but which would be a PITA to retrofit would be something akin to fgbio's SamOrder, which defines more than the |
Ah, the eternal problem #766. I was assuming that awkwardly working around this was the impetus for the last pr? I think we have a few options none of which are trivial. An alternative would be to try to make some sort of meta queryname sorter which tries to figure out the subsort when it's reading a file. That shouldn't be hard I would think, just try both sort orders and if one passes it's valid. I can't remember if there's any issue with stateful comparators but I think it wouldn't be a problem to have one that figures out which it is and then remembers it. We should write the SS field in either case... |
@lbergelson I don't think samtools writes the I think the easiest first step would be to have HTSJDK accept either as being queryname sorted - HTSJDK already has that Then the question becomes ... if it can accept either as valid, is there any reason not to switch to the samtools style as the default output for |
The main reason not to switch the default immediately is that a lot of people used mixed pipeline where some of the tools are on older versions. Switching would break them until they update all parts of the pipeline. Does samtools have trouble reading htsjdk queryname bams? I'm not sure the problem is symmetrical. |
I don't believe there are issues with samtools reading picard/htsjdk queryname sorted files, but it's not something I do regularly. I think samtools tends more towards trusting/assuming the user provided acceptable input and not validating. |
@nh13, @lbergelson I put this together mostly for fun - it's a comparator that will sort read names the same way samtools does. I'm not really sure what the best way to integrate this is in HTSJDK though. Thoughts?