Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kallisto index error #1

Open
inti opened this issue Aug 18, 2018 · 15 comments
Open

kallisto index error #1

inti opened this issue Aug 18, 2018 · 15 comments

Comments

@inti
Copy link

inti commented Aug 18, 2018

Hi I am getting the following error
I build the index with the kallisto provided with kallisto-align and also installing it with conda on a separate environment. On both cases a get the following erro.

[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0

Many thanks in advance

@kbchoi-jax
Copy link
Member

kbchoi-jax commented Aug 19, 2018

Hi Inti,

Try to build kallisto index using its older version (https://pachterlab.github.io/kallisto/download) like v0.42.1. They upgraded its indexing to version 9 at some point but our kallisto-align uses version 8. We will catch up with it at some point (I am considering to merge it to alntools) but not soon unfortunately. Thanks for using kallisto-align.

KB

@inti
Copy link
Author

inti commented Aug 20, 2018

Hi,
thanks for the response. That did not work. I got the same error as before.
I built the index with kallisto v0.42.1 then with `kallisto-align``

bash-4.2$ ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate

@inti
Copy link
Author

inti commented Aug 20, 2018

Hi, I had done it previously with kallisto (v0.42.1)
Sorry I did not send the full code I ran. Here I am sending the output of building the index and trying to run kallisto-align
I also tried with the kallisto v0.42 and had the same error

bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.k_idx emase/SRR5125117/SRR5125117.transcripts.fa

[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: replaced 14045 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 2065 contigs and contains 185230 k-mers

bash-4.2$  ~/app/kallisto-align/kallisto-align -i emase/SRR5125117/SRR5125117.k_idx -f fastq/SRR5125117_1.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regeneratebash-4.2$

@kbchoi-jax
Copy link
Member

I am sorry, I meant you should try if kallisto quant works fine with the same input files on v0.42.1.

kallisto 0.42.1
Computes equivalence classes for reads and quantifies abundances

Usage: kallisto quant [arguments] FASTQ-files

Required arguments:
-i, --index=STRING            Filename for the kallisto index to be used for
                              quantification
-o, --output-dir=STRING       Directory to write output to

Optional arguments:
    --single                  Quantify single-end reads
-l, --fragment-length=DOUBLE  Estimated average fragment length
                              (default: value is estimated from the input data)
-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)
    --seed=INT                Seed for the bootstrap sampling (default: 42)
    --plaintext               Output plaintext instead of HDF5

@inti
Copy link
Author

inti commented Aug 21, 2018

I did try

bash-4.2$ ~/app/kallisto_linux-v0.42.1/kallisto index -i emase/SRR5125117/SRR5125117.transcripts.k_idx emase/SRR5125117/SRR5125117.transcripts.fa

[build] loading fasta file emase/SRR5125117/SRR5125117.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 182 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 84409 contigs and contains 21292011 k-mers

bash-4.2$  ~/app/kallisto_linux-v0.42.1/kallisto quant -i emase/SRR5125117/SRR5125117.transcripts.k_idx -o test fastq/SRR5125117_1.fastq.gz fastq/SRR5125117_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29493
[index] number of k-mers: 21292011
[index] number of equivalence classes: 61508
[quant] running in paired-end mode
[quant] will process pair 1: fastq/SRR5125117_1.fastq.gz
                             fastq/SRR5125117_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 0 reads, 0 reads pseudoaligned
[quant] estimated average fragment length: -nan
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1 rounds

it does not work (all transcripts have 0 counts) ... :/ it does not work with the same files and teh newest version of kallisto (v0.44). Neither it works with the ref transcriptome Bombus_terrestris.Bter_1.0.cdna.all.fa
I have used kallisto recently, so this is odd and I did not expected it
Not sure what is going on ...

@kbchoi-jax
Copy link
Member

kbchoi-jax commented Aug 21, 2018

Anyways it seems that your issue is not due to our kallisto-align. Take a look at your transcripts.fa file.

@inti
Copy link
Author

inti commented Aug 21, 2018

Sorry ... I had use kallisto recently, so did not expect the issue would be there. Apologies again

@inti inti closed this as completed Aug 21, 2018
@inti
Copy link
Author

inti commented Aug 21, 2018

1

Using prepare-emase to generate diploid transriptome using as input the SRR5125117.gtf and SRR5125117.fa generated with g2gtools

grep "_R" SRR5125117.gtf > SRR5125117.R.gtf
grep "_L" SRR5125117.gtf > SRR5125117.L.gtf
prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.info
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.info
sed -i "s/_R_R/_R/g" test/emase.pooled.transcripts.fa
sed -i "s/_L_L/_L/g" test/emase.pooled.transcripts.fa

2 Build index

~/app/kallisto_linux-v0.42.1/kallisto index -i test/emase.pooled.transcripts.k_idx test/emase.pooled.transcripts.fa

[build] loading fasta file test/emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers

3 quant step

~/app/kallisto_linux-v0.42.1/kallisto quant -i test/emase.pooled.transcripts.k_idx -o test_kallisto ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../fastq/SRR5125122_1.fastq.gz
                             ../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ...
 done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1088 rounds

4 quant output

head test_kallisto/abundance.txt
target_id	length	eff_length	est_counts	tpm
ENSRNA049756373-T1_L	91	91	0.5	0.939
ENSRNA049756376-T1_L	86	86	1.5	2.98078
ENSRNA049756377-T1_L	101	101	0	0
ENSRNA049756378-T1_L	119	119	0	0
ENSRNA049756379-T1_L	141	141	0	0
ENSRNA049756380-T1_L	92	92	0	0
ENSRNA049756381-T1_L	103	103	0	0
ENSRNA049756382-T1_L	164	8.07353	0	0
ENSRNA049756383-T1_L	155	155	23	25.3591

5 Trying kallisto-align

~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

6 try building index with the kallisto distributed with kallisto-align

/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i emase.pooled.transcripts.kOld_index emase.pooled.transcripts.fa

[build] loading fasta file emase.pooled.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44839 contigs and contains 20829208 k-mers

$/home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto quant -i emase.pooled.transcripts.kOld_index -o k_old ../../../fastq/SRR5125122_1.fastq.gz ../../../fastq/SRR5125122_2.fastq.gz

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 29496
[index] number of k-mers: 20829208
[index] number of equivalence classes: 56014
[quant] running in paired-end mode
[quant] will process pair 1: ../../../fastq/SRR5125122_1.fastq.gz
                             ../../../fastq/SRR5125122_2.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 23476745 reads, 12434421 reads pseudoaligned
[quant] estimated average fragment length: 156.926
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1088 rounds

$head k_old/abundance.txt
target_id	length	eff_length	est_counts	tpm
ENSRNA049756373-T1_L	91	91	0.5	0.939
ENSRNA049756376-T1_L	86	86	1.5	2.98078
ENSRNA049756377-T1_L	101	101	0	0
ENSRNA049756378-T1_L	119	119	0	0
ENSRNA049756379-T1_L	141	141	0	0
ENSRNA049756380-T1_L	92	92	0	0
ENSRNA049756381-T1_L	103	103	0	0
ENSRNA049756382-T1_L	164	8.07353	0	0
ENSRNA049756383-T1_L	155	155	23	25.3591

7 kallisto-align with the new index

~/app/kallisto-align/kallisto-align -i test/emase.pooled.transcripts.kOld_index -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

I apologise again for whatever shambles or mistakes I did previously. kallisto is working fine, as expected I guess and as I commented I had used it before.

Both the kallisto you distribute with kallisto-align and the one I downloaded are v0.42.1

I am happy to send along or upload somewhere the transcriptome and fastq files if that helps to work out that is going on ...

Thanks again for your help on this!

@inti inti reopened this Aug 21, 2018
@kbchoi-jax
Copy link
Member

That error message is coming from kallisto and literally saying your index does not match the version for some reason. Try to build kallisto index using /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index.

@kbchoi-jax
Copy link
Member

And the following does not look right because you are providing a same fasta file for L and R. Usually you should provide L.fa and R.fa. Is SRR5125117.fa diploid genome you created with g2gtools?

$ prepare-emase -G SRR5125117.fa,SRR5125117.fa -g SRR5125117.L.gtf,SRR5125117.R.gtf -s L,R -o test -m -x

If SRR5125117.fa is diploid, I think you should be able to simply do the following.

$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test -m -x

@inti
Copy link
Author

inti commented Aug 21, 2018

Hi,
On the example above i did build the index with /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index see number 6 on the message above.

Regarding prepare-emase, yes SRR5125117.fa is the diploid genome generaed by g2gtools. I just tried to replicate the emase protocol which has separate files for each haplotype.

Here is the test. It does not seem to make a difference

$ prepare-emase -G SRR5125117.fa -g SRR5125117.gtf -o test2 -m -x

$ /home/ipedroso/app/kallisto-align/external/src/kallisto-build/src/kallisto index -i test2/emase.transcripts.k_idx test2/emase.transcripts.fa

[build] loading fasta file test2/emase.transcripts.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
        from 472 target sequences
[build] warning: replaced 78 non-ACGUT characters in the input sequence
        with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 44835 contigs and contains 20829133 k-mers

$ ~/app/kallisto-align/kallisto-align -i test2/emase.transcripts.k_idx -f ../../fastq/SRR5125122_1.fastq.gz ../../fastq/SRR5125122_2.fastq.gz -b my_sample.bin
[kallisto-align] Creating my_sample.bin...
Error: incompatible indices. Found version 9, expected version 0
Rerun with index to regenerate%

Previously you say kallisto currently uses index version 9 and kallisto-align uses version 8. However, the message says it expects version 0 of the index. Is that correct?
If I send you the transcriptome index, would you try to replicate the error?

Many thanks again

@inti
Copy link
Author

inti commented Aug 21, 2018

quick question. What does kallisto-align actually do? If I run kallisto generate a pseudobam file and convert it into a emase-binary format with alntools, would that replace kallisto-align?

@kbchoi-jax
Copy link
Member

You are right, you can convert kallisto pseudobam into emase binary file and run emase-zero. But kallisto-align does it way faster. The kallisto that we carry should not create Version 9 index.

@inti
Copy link
Author

inti commented Aug 21, 2018

Let me know if there is anything I can do to hlep debug this. i will try the long side-path to test the g2gtools + emase-zero pipeline.

Thanks a lot again and sorry for the initial confusion

@inti
Copy link
Author

inti commented Sep 25, 2018

Hi,
Any updates on this issue? would love to use kallisto-align.

Regarding:

You are right, you can convert kallisto pseudobam into emase binary file and run emase-zero. But kallisto-align does it way faster. The kallisto that we carry should not create Version 9 index.

What would be the equivalent steps: kallisto [fastq -> pseudobam] => alntools [pseudobam -> bin-emase] => emase-zero [awesome results]

Do you do local alignment of the reads to the transcripts? I understand the pseudobam does not really align to the read but rather assign it to the read and make up a cigar string. Perhaps really the question is whether emase-zero needs alignments or it can do with read-trancript assignment?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants