-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with messing up VERSION and ACCESSION #359
Comments
Hi @lingyi-owl , The behaviour of the gff-gbk conversion program inside pharokka (bcbio-gff) will try and convert the pharokka gff to a genbank based on the genbank format. https://www.ncbi.nlm.nih.gov/genbank/samplerecord/ I think there is no issue with the ACCESSION, as genbank accessions will omit everything before the first '.' . GenBank records usually are like e.g "{ACCESSION}.{VERSION_NUMBER}" Which is the behaviour for your contig. VERSION is strange, it is cutting out the 0 - my guess is that is parses '052837' as '52837' (and e.g. would parse 01 as 1). I will have a look and may make an issue in bcbio-gff but I'm inclined to not make any changes regardless. To be honest, in general, I'd advocate renaming your contigs before running pharokka from the complicated/long spades headers to something easier for software to parse. In my opinion, the problem is really when we run something like pharokka, we are trying to do a bit too much with the GenBank format (which isn't optimal!). So that should mitigate the problem. George |
Dear George, Is something wrong in my configuration? Esteban |
Hi @estebano11 - this is no bug in pharokka, just a warning, so it is not a big deal and I would just ignore it. There is nothing wrong with your configuration. I am pretty sure it is caused by numpy v2. You can remove this warning by installing numpy v1 e.g. George |
Dear George, Esteban |
Description
I used pharokka to annotate viral contigs. The run was finished successfully. However, some of the VERSION ids are messed up in the output file while the LOCUS and DEFINITION keep the same with the sequence headers of the input fasta file.
For example, in the below output file, the LOCUS is the same with the input sequence header, but the ACCESSION and the VERSION were changed. I subsequently used phold to further annotate the pharokka output genbank file. Phold would use the VERSION id as the sequence ID. Thus, my phold output has different information with the original input sequence headers. It brings me troubles with tracing information along the analysis. Could you please help me with this problem so that I can run pharokka in the future with large amount of data without correcting the sequence headers later?
Best,
Lingyi
I ran the following code and it finished successfully.
there is a error message in the begnining of the run:
The text was updated successfully, but these errors were encountered: