Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread 'main' panicked, "index out of bounds: the len is 10 but the index is 10" #244

Open
lucy924 opened this issue Aug 7, 2024 · 9 comments
Labels
build-available custom build produced for fix. troubleshooting workflow and data preparation questions

Comments

@lucy924
Copy link

lucy924 commented Aug 7, 2024

Hi,
I've had this issue cropping up in some of my dmr's but not consistently.
I'm looping through each of my paired samples and each chromosome reference (human T2T), with different pileups (one using the -traditional preset and another using -cpg.
The samples are of differently treated cancer cell lines that have been whole genome sequenced.

experiment names:
B3, C3, B4, B5, C5, B6, C6

Problem pileup, pair, chromosome so far:

trad,C6,chr13
trad,C5,chrY
trad,C6,chrY
trad,B3,chrM
trad,C3,chrM
trad,C5,chrM
trad,B6,chrM
trad,C6,chrM
mCG,C6,chr6
mCG,B3,chr10
mCG,B4,chr10
mCG,B6,chr10
mCG,B5,chr11
mCG,B6,chr12
mCG,B5,chr14

Do you think rerunning these specific pileups will help?

Pileup code

# traditional
modkit pileup ${path2reads_bam} ${pileup_bed} \
        --ref $reference \
        --preset traditional \
        --threads 10 \
        --log-filepath $log_file

# CG pileup
modkit pileup ${path2reads_bam} ${pileup_bed} \
        --ref $reference \
        --cpg \
        --threads 10 \
        --log-filepath $log_file

DMR code

modkit dmr pair \
        -a "${control_bed}" \
        --index-a "${control_bed}.tbi" \
        -b "${test_bed}" \
        --index-b "${test_bed}.tbi" \
        -o "${dmr_result}" \
        --header \
        --ref "${reference}" \
        --base C \
        --threads 9 \
        --log-filepath "${dmr_log}" \
        -f \
        --min-valid-coverage 20 \
        --delta 0.1 

Maybe related issue
I'm not sure if this next issue related but I'm also getting a different error with other pairs as well:

> sampled 46 a records and 0 b records, calculating max coverages for 95th percentile
> Error! not enough data points (got 0) to calculate percentile

This is the list of experiments with not enough datapoints

trad,B3,chrY
trad,B4,chrY
trad,B5,chrY
trad,B6,chrY
mCG,B6,chr11
mCG,B6,chr14
mCG,B3,chrY
mCG,B4,chrY
mCG,B5,chrY
mCG,C5,chrY
mCG,B6,chrY
mCG,C6,chrY
mCG,B4,chr14

I initially didn't think this was a problem, (especially as the B experiments are a female derived cell line) but then I started getting the same error on autosomal chromosomes too, in which the alignment depth might not be great but I didn't think it was this bad.

Appreciate any insights!

@ArtRand
Copy link
Contributor

ArtRand commented Aug 7, 2024

Hello @lucy924,

Could you send me the whole panic message?

@lucy924
Copy link
Author

lucy924 commented Aug 7, 2024

Hi @ArtRand
Here it is:

- performing DMR on methylated CpG pileups, single site
> reading reference FASTA at "/home/dejlu879/refs/hg38_chromsonly_ebv_lambda_bcg/chr15.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 12 a records and 1502 b records, calculating max coverages for 95th percentile
thread 'main' panicked at src/thresholds.rs:39:22:
index out of bounds: the len is 12 but the index is 12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ArtRand
Copy link
Contributor

ArtRand commented Aug 9, 2024

Hello @lucy924, thanks for sending the traceback.

I'll have a build to you with the fix asap.

@ArtRand
Copy link
Contributor

ArtRand commented Aug 10, 2024

Hello @lucy924,

What you're doing looks reasonable to me. With respect to this error:

> sampled 46 a records and 0 b records, calculating max coverages for 95th percentile
> Error! not enough data points (got 0) to calculate percentile

If for some reason bedMethyl records from the 'b' sample are systematically failing you cold see zero records being sampled and these errors will be in the log. Looks like you have --log-filepath "${dmr_log}" set, could you tell me if there are error messages in there? They would look like "batch failed during sampling.." or "failed to get coverages...". You can work-around this by passing --max-coverages option, I would pick a number that's roughly what coverage you expect. If you have less coverage than the max, that's fine. Otherwise I'm a little curious how you're getting zero records for one of the samples, is it possible that your tabix index has the contig listed, but there are in fact zero records for that contig? Maybe checking tabix -l "${test_bed}" and see if it makes sense.

@lucy924
Copy link
Author

lucy924 commented Aug 12, 2024

Hi @ArtRand

I went through a bunch of the logs and none of them give any actual error messages, they just stop with the last line sampled 46 a records and 0 b records, calculating max coverages for 95th percentile.
I'm not sure if I've missed an error log output somewhere?
The output of tabix -l "${test_bed}" is as expected, with only one contig (I've been aligning each chromosome separately for reasons), and I checked that the bed files do actually contain records, so it's not that they are empty!
I have attached a few log files if they are of any use, all of one pair of samples (B6) on chromosome 14 that had a some dmr runs work and some that didn't.

File Status Description
B6_pair.chr14.6mA_dmr.v1ss.bed.log DMR failed DMR using pileups looking for the 6mA modification
B6_pair.chr14.trad_dmr.v1ss.bed.log DMR successful DMR using pileups made with the traditional preset
B6_Control.chr14.6mA_pileup.log Fine Associated DMR found 0 records (see above)
B6_Control.chr14.mCG_pileup.log Fine Associated DMR may have found 0 records
B6_Test.chr14.mCG_pileup.log Fine Associated DMR may have found 0 records
B6_Control.chr14.trad_pileup.log Fine Associated DMR was successful (see above)

Unfortunately the mCG DMR logs were deleted somewhere along the way, but if you need them I can rerun it at some point.

debugging.zip

@ArtRand
Copy link
Contributor

ArtRand commented Aug 14, 2024

Hello @lucy924,

Thanks for sending the logs over. Do other chromosomes 6mA DMR seem to work and just chr14 is having problems? One thing I noticed is that you use --motif A 0 --motif T 0 I assume to capture the complement T? You don't actually need to do this --motif A 0 will capture (+)-strand As and (-)-strand T positions. If you short-circuit this estimation step by passing --max-coverages to the dmr pair for chr14 6mA, does it complete successfully?

Sorry for the somewhat cryptic asks, I have a theory for what's going on, but I need a little more information.

Thanks.

@lucy924
Copy link
Author

lucy924 commented Aug 17, 2024

Hi @ArtRand

Thanks for the clarification on the A/T - I wasn't entirely sure from the docs so put T down just in case, figuring it wouldn't do any harm.

Other than chrY, I had the same problem with one sample, chr11, mCG, DMR, but not with 6mA DMR. Chr14 was having problems with multiple samples on mCG, and one of those samples on 6mA.

I set --max-coverages to 100 and reran all samples/all chroms, and chr14 didn't error, but it didn't work either:

On mCG

> finished, processed 0 sites successfully, 0 failed

6mA is behaving weirdly on all samples this time, and I don't think I've done anything different. Unfortunately I don't have time this week to delve into it more, I'll let you know if I figure out what happened.

I've attached example outputs for both mCG and 6mA, hopefully it helps ...

I managed to get good outputs using the below (both B and C sets):

    modkit dmr pair \
    -a "${control_bed_C3}" \
    -a "${control_bed_C5}" \
    -a "${control_bed_C6}" \
    -b "${test_bed_C3}" \
    -b "${test_bed_C5}" \
    -b "${test_bed_C6}" \
    -o "${dmr_result}" \
    --header \
    --ref "${reference}" \
    --segment "${dmr_segment}" \
    --base C \
    --base A \
    --threads 20 \
    --log-filepath "${dmr_log}" \
    -f \
    --min-valid-coverage 20 \
    --delta 0.1

And I'm running with this data for now, do you think it likely that issues happened with the same samples on chr14 and the problems are just hidden in this case?

Thanks,
Lucy

example_outputs.txt

@ArtRand
Copy link
Contributor

ArtRand commented Sep 12, 2024

Hello @lucy924,

I've made a refactor to how the tabix indices are handled in dmr. Please find a build attached. Hopefully with this build the outputs from the various runs become consistent. Thanks,

A
modkit_u16_x86_64.tar.gz

@ArtRand ArtRand added build-available custom build produced for fix. troubleshooting workflow and data preparation questions labels Sep 12, 2024
@lucy924
Copy link
Author

lucy924 commented Sep 23, 2024

Hi @ArtRand
Thanks for that, I'll test it out when I have the chance.
Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-available custom build produced for fix. troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

2 participants