kallisto index error #1

inti · 2018-08-18T02:34:57Z

Hi I am getting the following error
I build the index with the kallisto provided with kallisto-align and also installing it with conda on a separate environment. On both cases a get the following erro.

[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0

Many thanks in advance

The text was updated successfully, but these errors were encountered:

kbchoi-jax · 2018-08-19T12:44:30Z

Hi Inti,

Try to build kallisto index using its older version (https://pachterlab.github.io/kallisto/download) like v0.42.1. They upgraded its indexing to version 9 at some point but our kallisto-align uses version 8. We will catch up with it at some point (I am considering to merge it to alntools) but not soon unfortunately. Thanks for using kallisto-align.

KB

inti · 2018-08-20T01:28:49Z

Hi,
thanks for the response. That did not work. I got the same error as before.
I built the index with kallisto v0.42.1 then with `kallisto-align``

bash-4.2$ ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate

inti · 2018-08-20T18:04:26Z

Hi, I had done it previously with kallisto (v0.42.1)
Sorry I did not send the full code I ran. Here I am sending the output of building the index and trying to run kallisto-align
I also tried with the kallisto v0.42 and had the same error

bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.k_idx emase/SRR5125117/SRR5125117.transcripts.fa

[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: replaced 14045 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 2065 contigs and contains 185230 k-mers

bash-4.2$  ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regeneratebash-4.2$

kbchoi-jax · 2018-08-20T18:20:39Z

I am sorry, I meant you should try if kallisto quant works fine with the same input files on v0.42.1.

kallisto 0.42.1
Computes equivalence classes for reads and quantifies abundances

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING            Filename for the kallisto index to be used for
                              quantification
-o, --output-dir=STRING       Directory to write output to

Optional arguments:
    --single                  Quantify single-end reads
-l, --fragment-length=DOUBLE  Estimated average fragment length
                              (default: value is estimated from the input data)
-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
    --seed=INT                Seed for the bootstrap sampling (default: 42)
    --plaintext               Output plaintext instead of HDF5

inti · 2018-08-21T01:32:11Z

I did try

bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.transcripts.k_idx emase/SRR5125117/SRR5125117.transcripts.fa

[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 182 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 84409 contigs and contains 21292011 k-mers

bash-4.2$  ~/app/kallisto_linux-v0.42.1/kallisto quant -i emase/SRR5125117/SRR5125117.transcripts.k_idx -o test fastq/SRR5125117_1.fastq.gz fastq/SRR5125117_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29493
[index] number of k-mers: 21292011
[index] number of equivalence classes: 61508
[quant] running in paired-end mode
[quant] will process pair 1: fastq/SRR5125117_1.fastq.gz
                             fastq/SRR5125117_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 0 reads, 0 reads pseudoaligned
[quant] estimated average fragment length: -nan
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1 rounds

it does not work (all transcripts have 0 counts) ... :/ it does not work with the same files and teh newest version of kallisto (v0.44). Neither it works with the ref transcriptome Bombus_terrestris.Bter_1.0.cdna.all.fa
I have used kallisto recently, so this is odd and I did not expected it
Not sure what is going on ...

kbchoi-jax · 2018-08-21T01:51:10Z

Anyways it seems that your issue is not due to our kallisto-align. Take a look at your transcripts.fa file.

inti · 2018-08-21T13:34:57Z

Sorry ... I had use kallisto recently, so did not expect the issue would be there. Apologies again

inti · 2018-08-21T14:38:40Z

1

Using prepare-emase to generate diploid transriptome using as input the SRR5125117.gtf and SRR5125117.fa generated with g2gtools

grep "_R" SRR5125117.gtf > SRR5125117.R.gtf
grep "_L" SRR5125117.gtf > SRR5125117.L.gtf
prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.info
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.info
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.fa
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.fa

2 Build index

~/app/kallisto_linux-v0.42.1/kallisto index -i test/emase.pooled.transcripts.k_idx test/emase.pooled.transcripts.fa

[build] loading fasta file test/emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers

3 `quant` step

~/app/kallisto_linux-v0.42.1/kallisto quant -i test/emase.pooled.transcripts.k_idx -o test_kallisto ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../fastq/SRR5125122_1.fastq.gz
                             ../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ...
 done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1088 rounds

4 `quant` output

head test_kallisto/abundance.txt
target_id	length	eff_length	est_counts	tpm
ENSRNA049756373-T1_L	91	91	0.5	0.939
ENSRNA049756376-T1_L	86	86	1.5	2.98078
ENSRNA049756377-T1_L	101	101	0	0
ENSRNA049756378-T1_L	119	119	0	0
ENSRNA049756379-T1_L	141	141	0	0
ENSRNA049756380-T1_L	92	92	0	0
ENSRNA049756381-T1_L	103	103	0	0
ENSRNA049756382-T1_L	164	8.07353	0	0
ENSRNA049756383-T1_L	155	155	23	25.3591

5 Trying `kallisto-align`

~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

6 try building index with the `kallisto` distributed with `kallisto-align`

/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i emase.pooled.transcripts.kOld_index emase.pooled.transcripts.fa

[build] loading fasta file emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers

$/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto quant -i emase.pooled.transcripts.kOld_index -o k_old ../../../fastq/SRR5125122_1.fastq.gz ../../../fastq/SRR5125122_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../../fastq/SRR5125122_1.fastq.gz
                             ../../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1088 rounds

$head k_old/abundance.txt
target_id	length	eff_length	est_counts	tpm
ENSRNA049756373-T1_L	91	91	0.5	0.939
ENSRNA049756376-T1_L	86	86	1.5	2.98078
ENSRNA049756377-T1_L	101	101	0	0
ENSRNA049756378-T1_L	119	119	0	0
ENSRNA049756379-T1_L	141	141	0	0
ENSRNA049756380-T1_L	92	92	0	0
ENSRNA049756381-T1_L	103	103	0	0
ENSRNA049756382-T1_L	164	8.07353	0	0
ENSRNA049756383-T1_L	155	155	23	25.3591

7 `kallisto-align` with the new index

~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.kOld_index -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

I apologise again for whatever shambles or mistakes I did previously. kallisto is working fine, as expected I guess and as I commented I had used it before.

Both the kallisto you distribute with kallisto-align and the one I downloaded are v0.42.1

I am happy to send along or upload somewhere the transcriptome and fastq files if that helps to work out that is going on ...

Thanks again for your help on this!

kbchoi-jax · 2018-08-21T15:00:14Z

That error message is coming from kallisto and literally saying your index does not match the version for some reason. Try to build kallisto index using /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index.

kbchoi-jax · 2018-08-21T15:34:14Z

And the following does not look right because you are providing a same fasta file for L and R. Usually you should provide L.fa and R.fa. Is SRR5125117.fa diploid genome you created with g2gtools?

$ prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x

If SRR5125117.fa is diploid, I think you should be able to simply do the following.

$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test -m -x

inti · 2018-08-21T19:29:50Z

Hi,
On the example above i did build the index with /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index see number 6 on the message above.

Regarding prepare-emase, yes SRR5125117.fa is the diploid genome generaed by g2gtools. I just tried to replicate the emase protocol which has separate files for each haplotype.

Here is the test. It does not seem to make a difference

$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test2 -m -x

$ /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i test2/emase.transcripts.k_idx test2/emase.transcripts.fa

[build] loading fasta file test2/emase.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44835 contigs and contains 20829133 k-mers

$ ~/app/kallisto-align/kallisto-align -i test2/emase.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

Previously you say kallisto currently uses index version 9 and kallisto-align uses version 8. However, the message says it expects version 0 of the index. Is that correct?
If I send you the transcriptome index, would you try to replicate the error?

Many thanks again

inti · 2018-08-21T19:37:50Z

quick question. What does kallisto-align actually do? If I run kallisto generate a pseudobam file and convert it into a emase-binary format with alntools, would that replace kallisto-align?

kbchoi-jax · 2018-08-21T19:49:28Z

You are right, you can convert kallisto pseudobam into emase binary file and run emase-zero. But kallisto-align does it way faster. The kallisto that we carry should not create Version 9 index.

inti · 2018-08-21T20:01:56Z

Let me know if there is anything I can do to hlep debug this. i will try the long side-path to test the g2gtools + emase-zero pipeline.

Thanks a lot again and sorry for the initial confusion

inti · 2018-09-25T14:37:23Z

Hi,
Any updates on this issue? would love to use kallisto-align.

Regarding:

You are right, you can convert kallisto pseudobam into emase binary file and run emase-zero. But kallisto-align does it way faster. The kallisto that we carry should not create Version 9 index.

What would be the equivalent steps: kallisto [fastq -> pseudobam] => alntools [pseudobam -> bin-emase] => emase-zero [awesome results]

Do you do local alignment of the reads to the transcripts? I understand the pseudobam does not really align to the read but rather assign it to the read and make up a cigar string. Perhaps really the question is whether emase-zero needs alignments or it can do with read-trancript assignment?

Thanks in advance

inti closed this as completed Aug 21, 2018

inti reopened this Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kallisto index error #1

kallisto index error #1

inti commented Aug 18, 2018 •

edited

Loading

kbchoi-jax commented Aug 19, 2018 •

edited

Loading

inti commented Aug 20, 2018

inti commented Aug 20, 2018 •

edited

Loading

kbchoi-jax commented Aug 20, 2018

inti commented Aug 21, 2018 •

edited

Loading

kbchoi-jax commented Aug 21, 2018 •

edited

Loading

inti commented Aug 21, 2018

inti commented Aug 21, 2018 •

edited

Loading

kbchoi-jax commented Aug 21, 2018

kbchoi-jax commented Aug 21, 2018

inti commented Aug 21, 2018

inti commented Aug 21, 2018

kbchoi-jax commented Aug 21, 2018

inti commented Aug 21, 2018 •

edited

Loading

inti commented Sep 25, 2018 •

edited

Loading

kallisto index error #1

kallisto index error #1

Comments

inti commented Aug 18, 2018 • edited Loading

kbchoi-jax commented Aug 19, 2018 • edited Loading

inti commented Aug 20, 2018

inti commented Aug 20, 2018 • edited Loading

kbchoi-jax commented Aug 20, 2018

inti commented Aug 21, 2018 • edited Loading

kbchoi-jax commented Aug 21, 2018 • edited Loading

inti commented Aug 21, 2018

inti commented Aug 21, 2018 • edited Loading

1

2 Build index

3 quant step

4 quant output

5 Trying kallisto-align

6 try building index with the kallisto distributed with kallisto-align

7 kallisto-align with the new index

kbchoi-jax commented Aug 21, 2018

kbchoi-jax commented Aug 21, 2018

inti commented Aug 21, 2018

inti commented Aug 21, 2018

kbchoi-jax commented Aug 21, 2018

inti commented Aug 21, 2018 • edited Loading

inti commented Sep 25, 2018 • edited Loading

inti commented Aug 18, 2018 •

edited

Loading

kbchoi-jax commented Aug 19, 2018 •

edited

Loading

inti commented Aug 20, 2018 •

edited

Loading

inti commented Aug 21, 2018 •

edited

Loading

kbchoi-jax commented Aug 21, 2018 •

edited

Loading

inti commented Aug 21, 2018 •

edited

Loading

3 `quant` step

4 `quant` output

5 Trying `kallisto-align`

6 try building index with the `kallisto` distributed with `kallisto-align`

7 `kallisto-align` with the new index

inti commented Aug 21, 2018 •

edited

Loading

inti commented Sep 25, 2018 •

edited

Loading