-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion `num_minimizers <= static_cast<size_t>(INT_MAX)' failed #131
Comments
What is your reference genome? Why there are so many sequences and the total length is very long? It seems that the genome is too large for Chromap to handle. |
Same problem here with a 9Gb genome. What is the limit of Chromap? Could some parameters be changed to improve this? PS: Had no problem with a 5GB genome before ... |
@HMPNK What is the longest chromosome of the 9GB genome? |
@mourisl |
Did you get the same error message? Your genome is large and I guess it has more than 2^32-1 minimizers. If this is the case, it will require some code change to support very large genome. |
the error code was slightly different: chromap: src/index.cc:33: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion `num_minimizers <= static_cast<size_t>(0x7fffffff)' failed. It collected 3.350.432.716 minimizers which is less than the maximum 2^32-1. |
I just checked. The max number of minimizers currently supported by Chromap is 2^31 - 1 instead of 2^32 - 1. So it would require some code change before Chromap can support large genomes like what you have. |
Could you provide that changes? |
I just did a test and using "-w13" worked. Increasing -w efficiently reduces number of minimizers. But I guess increasing "-w" will reduce sensitivity of mapping? What do you think? |
What is your read length? If your read is long, increase w probably won't affect the accuracy much. |
It is 2 times 150bp, |
I think 150bp should be fine to handle "-w 13". Since your genome is large, you can increase "-k" a little bit to ensure each minimizer is unique enough on the genome, maybe -k 23 -w 17. Then you will have 3 non-overlap windows to locate seeds. The default parameter was selected for 50bp scATAC-seq data. @haowenz Is this reasonable? |
The fragment size can still be short though. Currently, increasing w is probably the only way to use large genome. It may affect sensitivity, but probably not much as you only increase it by 3 and the k-mer size doesn't change. For long term, we should support a larger number of minimizers. |
getting same issue for Axolotl genome which is even bigger around 27G. Do you think it will be possible to address this any time soon? Any suggestions for -k and -w parameters? I have R1 50bp and R2 60bp bulk ATAC-seq. Setting -w 13 still fails. |
You may try keep k-mer length at 17 (-k 17) and increase window size to 13 (-w 13) and even larger to see if it works. |
-w 24 seems to be the smallest window size that works for this genome, and I'm getting an alignment rate of around 50% with that. That likely suggests that 24 is too large? Will probably need to benchmark against another aligner. |
That's possible. Can you post more numbers here? It is also possible that the genome is repetitive and lots of multi-mappings are filtered out. |
Number of reads: 1204478674. Closer to 70% with multi-mappers. Uniquely mapped read-pairs (lines in the output file) is 288M, so closer to 48%. I'll check bowtie2. The index is taking a long time to prepare. |
thanks for the numbers. You may try bowtie2. But it should be even slower to build an FM-index. |
Same problem here with an about 9Gb genome, when I map Hi-C short reads, as follows. How can I solve it? Build index for the reference. |
@Biscuite-wzy You can increase k-mer length (-k) and window size (-w) values a bit to see whether it works. How long is the longest chromosome in your genome? |
Hi,
I increase window size (-w) values it works well.
787117923
***@***.***
…------------------ 原始邮件 ------------------
发件人: "haowenz/chromap" ***@***.***>;
发送时间: 2023年11月13日(星期一) 中午11:51
***@***.***>;
***@***.******@***.***>;
主题: Re: [haowenz/chromap] Assertion `num_minimizers <= static_cast<size_t>(INT_MAX)' failed (Issue #131)
@Biscuite-wzy You can increase k-mer length (-k) and window size (-w) values a bit to see whether it works. How long is the longest chromosome in your genome?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
is there any fix for this problem? I run into the same issue when using axolotl genome, and have to set window size with greater number (-w 31) to build genome index. however, I suppose that the large window size would not be suitable for me as I have dataset from different species genome which is generated using the default parameters. so here just want to know if there is any update? |
Besides tuning the parameters, there is no easy fix on top of the current Chromap codebase to support a very huge genome. We plan to see if this is possible in the near future. |
Hi, was this issue fixed in the recent version of Chromap (0.2.6) ? I have genomes of 18Gb and 26Gb.
What would be the best -k and -w for genomes of this size? Thank you. |
This has not been fixed. I think the best way to handle this is to use a standard value for k, but increase the value for -w. |
Hi, thank you for the reply. I am currently using version 0.2.1. So I guess, updating the version would be help to resolve this. Can I use -w 20 for this size genome ? |
Sorry, I made a typo..it has NOT been fixed.. |
Oh! Then probably using 0.2.6 won't solve it. I will try to increase value for -w and check. |
Hello,
I want to run chromap using my genome file
But, coredumped went out
Here is the log file and command
Command
$chromap -i -r Combined_pseudohap.phased.filtered.0.arcs.fasta -o chromap.index -t 100 >chromap.index.log 2>chromap.index.log2
log file
Build index for the reference.
Kmer length: 17, window size: 7
Reference file: Combined_pseudohap.phased.filtered.0.arcs.fasta
Output file: chromap.index
Loaded all sequences successfully in 156.47s, number of sequences: 41577, number of bases: 19811410511.
Collecting minimizers.
Collected 4958576388 minimizers.
Sorting minimizers.
Sorted all minimizers.
chromap: src/index.cc:33: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion `num_minimizers <= static_cast<size_t>(INT_MAX)' failed.
Are there any comments to figure out?
Best wishes,
The text was updated successfully, but these errors were encountered: