Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can CollectWgsMetrics fall back to slow algorithm gracefully if the fast one fails with the default read length? #1970

Open
gokalpcelik opened this issue Jul 8, 2024 · 8 comments

Comments

@gokalpcelik
Copy link
Contributor

This issue seems to be present in the latest version as well. I have a collection of WGS bam files (#>100) and fast algorithm works gracefully for most of them whereas only a few of them fail with the same error due to read length is not being optimal

java.lang.ArrayIndexOutOfBoundsException: The requested index 1247601 is out of counter bounds. Possible cause of exception can be wrong READ_LENGTH parameter (much smaller than actual read length)
	at picard.analysis.CounterManager$Counter.checkIndex(CounterManager.java:189)
	at picard.analysis.CounterManager$Counter.get(CounterManager.java:183)
	at picard.analysis.FastWgsMetricsCollector.processRecord(FastWgsMetricsCollector.java:137)
	at picard.analysis.FastWgsMetricsCollector.addInfo(FastWgsMetricsCollector.java:105)
	at picard.analysis.WgsMetricsProcessorImpl.processFile(WgsMetricsProcessorImpl.java:93)
	at picard.analysis.CollectWgsMetrics.doWork(CollectWgsMetrics.java:242)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:280)
	at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)

I can understand that this parameter is important for estimates however instead of failing can the tool fall back to the slow algorithm gracefully?

@gokalpcelik gokalpcelik changed the title Can CollectWgsMetrics fall back to slow algorithm gracefully if the fast one fails with the default read length. Can CollectWgsMetrics fall back to slow algorithm gracefully if the fast one fails with the default read length? Jul 8, 2024
@lbergelson
Copy link
Member

@gokalpcelik What are you envisioning exactly? If it encounters an error it will restart at the beginning of the bam with the slow algorithm? Or it will process a specific read with the other algorithm and then combines them?

@gokalpcelik
Copy link
Contributor Author

This usually happens at the very beginning so restarting the traversal should not be a huge loss.

@lbergelson
Copy link
Member

That seems reasonable then.

@lbergelson
Copy link
Member

I wonder if it's possible to make the Fast collector work on these files?

If it's just retrying with the other collector that seems pretty simple, but kind of gross. Do you want to implement that? I might change the arguments so instead of USE_FAST_ALGORITHM we have a set of inputs SLOW, FAST, TRY_FAST_AND_FALL_BACK or something like that so you can control the behavior.

@gokalpcelik
Copy link
Contributor Author

gokalpcelik commented Jul 9, 2024 via email

@lbergelson
Copy link
Member

We will have to identify what the actual issue that's causing this is, because we want to only fall back in the cases where it would help to do so, and not just any random ArrayIndexOutOfBoundsException . We'll probably want to change it to throw something like ReadTooLongForFastCollectorException (I don't know enough to say if that's really the issue, but whatever it is.)

@kockan
Copy link
Contributor

kockan commented Aug 1, 2024

@lbergelson It is the issue. The READ_LENGTH parameter is used by FastWgsMetricCollector to allocate arrays (wrapped by CounterManager) and if some of your reads are longer than the value of READ_LENGTH this might cause ArrayIndexOutOfBoundsException in certain cases. Unfortunately, there is no way to set this value in a smart way without performing a first pass over your alignment file and get the maximum read length, and dynamically reallocating arrays at runtime should be a no-go since it affects the processing of previous alignments (don't quote me on this, maybe this can be done somehow, I am not familiar with the intricacies of FastWgsMetricCollector).

cc: @gokalpcelik

@multimeric
Copy link

The READ_LENGTH parameter is used by FastWgsMetricCollector to allocate arrays (wrapped by CounterManager) and if some of your reads are longer than the value of READ_LENGTH this might cause ArrayIndexOutOfBoundsException in certain cases.

If this is true, then perhaps the argument is mislabelled? Currently it is described as the average read length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants