-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: raise exception if CollectDuplexSeqMetrics run on consensus BAM #1003
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1003 +/- ##
=======================================
Coverage 95.62% 95.62%
=======================================
Files 126 126
Lines 7382 7388 +6
Branches 497 510 +13
=======================================
+ Hits 7059 7065 +6
Misses 323 323
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
.tapEach { | ||
r => | ||
if (Umis.isConsensusRead(r)) throw new IllegalArgumentException("Found consensus record. Expected UMI-grouped BAM") | ||
else r | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts here:
- Is it necessary to check each and every read? Or could you add
.bufferBetter
to the construction of_filteredIterator
and then just check the first element? - I think this is a case where a much more comprehensive error message would be useful, something like:
Input BAM file to CollectDuplexSeqMetrics ($input) appears to contain consensus reads.
CollectDuplexSeqMetrics cannot run on consensus BAMs, and instead requires the UMI-grouped BAM prior to consensus calling. The UMI-grouped BAM is the output of running
GroupReadsByUmi.
First record in $input has consensus tags present:
${_filteredIterator.peek()}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first point, I think it depends on what we're trying to guard against. If we're only guarding against a well formed consensus BAM being input then I think checking the first record should work.
If we're guarding against all the chaos a user may concoct, then this will not be sufficient. But, I think guarding against all those possibilities is probably unnecessary and unrealistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I do like the buffered and peek approach since a SAM/BAM is likely to have all the records with the tags, or none of them (in the non-corrupt example), so checking the first should be sufficient to catch 99%+ of improper inputs.
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/fulcrumgenomics/umi/CollectDuplexSeqMetrics.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works well for me! Thanks Zach! @tfenne could you give a final review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but lgtm
Closes #992.
This PR adds a new
isConsensusRead
method toUmis.scala
which is then used inCollectDuplexSeqMetrics
to raise an exception if the first record is a consensus record.In addition to the unit tests, I ran this on a consensus BAM locally and generated the expected error message. Running on the preceding grouped BAM generated the expected outputs as well.