Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: string index out of range when using extract_kraken_reads #101

Closed
fetyj opened this issue Nov 25, 2024 · 4 comments
Closed

IndexError: string index out of range when using extract_kraken_reads #101

fetyj opened this issue Nov 25, 2024 · 4 comments

Comments

@fetyj
Copy link

fetyj commented Nov 25, 2024

Hi,
I'm getting this error when using extract_kraken_reads.py, could you help me solve this error?

python /home/ga-fr-noy-linux02/KrakenTools-1.2/extract_kraken_reads.py -s1 cseqs_1.fq -s2 cseqs_2.fq -o phimixphages15_1.fq -o2 phimixphages15_2.fq --fastq-output -k BAPH15 -t 10239 --include-children -r S7410_phi882-25_phi143BolivieA
PROGRAM START TIME: 11-25-2024 09:25:12
>> STEP 0: PARSING REPORT FILE S7410_phi882-25_phi143BolivieA
Traceback (most recent call last):
  File "/home/ga-fr-noy-linux02/KrakenTools-1.2/extract_kraken_reads.py", line 433, in <module>
    main()
  File "/home/ga-fr-noy-linux02/KrakenTools-1.2/extract_kraken_reads.py", line 230, in main
    num = int(prev_node.level_id[-1]) + 1
              ~~~~~~~~~~~~~~~~~~^^^^
IndexError: string index out of range

Best regards,
FJ

@vincebaby6
Copy link

Hello FJ,

I recently got the same kind of error and it would seem that it happens when there is missing data in the taxonomic level column (the one with R, K, D, P, etc.) in the Kraken report. As an example, the second line indicating ''cellular organisms'' was missing the ''R1'' value in the column. Krakentools is not able to manage an empty value in that position with the ''root'' as the only exception. It is still not clear to me if this missing data is intentionnal in the latest version of Kraken2 and that krakentools simply did not keep up with those updates. It is also not clear to me if the data may be missing from the database itself, I used a pre-made database since I could not build it on my system due to administrative restrictions (don't ask...). I'm still looking into a solution and I'll post something here if I find a good workaround.

Regards,

Vincent

@fetyj
Copy link
Author

fetyj commented Nov 27, 2024

Hello Vincent,
Thanks for your hint on this issue. Hope you can find out something, I'm trying to bypass it using the --append option and proceeding rank by rank on the taxonomy without the use of report file.
Best regards,
FJ

@vincebaby6
Copy link

Hello FJ,

So, I've checked my database (more specifically the ktaxonomy.tsv file) and the ranks look ok. So, I don't know why the ranks are not good in the kraken report file. I finally resorted to write a little perl script that corrects the ranks of specific taxids that produced the error. It is not pretty or exhautive, but it worked and in the future it is easy to add more taxids to the list of correction if I need to. The most obivous corrections to make were to Bacteria (rank D), Eukaryota (rank D) and Archaea (rank D). I've also had to do correction for Viruses (rank 1, I first tried with D but it did not work. Viruses like to be special) and around 10 virus sublevels until I finally got rid of the error.

Regards,

Vincent

@fetyj
Copy link
Author

fetyj commented Nov 28, 2024

Hello Vincent,
Thanks for the tip, it worked also for my data :) I checked on the kraken2 github issue and found this one that is related to this case (DerrickWood/kraken2#888 (comment)). Going back to kraken2 v2.0.8 to 2.1.2 solve the problem.
Best Regards,

FJ

@fetyj fetyj closed this as completed Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants