Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with messing up VERSION and ACCESSION #359

Open
lingyi-owl opened this issue Aug 29, 2024 · 4 comments
Open

problem with messing up VERSION and ACCESSION #359

lingyi-owl opened this issue Aug 29, 2024 · 4 comments

Comments

@lingyi-owl
Copy link

  • pharokka version: 1.7.3
  • Python version: Python 3.10.14
  • Operating System: Linux

Description

I used pharokka to annotate viral contigs. The run was finished successfully. However, some of the VERSION ids are messed up in the output file while the LOCUS and DEFINITION keep the same with the sequence headers of the input fasta file.

For example, in the below output file, the LOCUS is the same with the input sequence header, but the ACCESSION and the VERSION were changed. I subsequently used phold to further annotate the pharokka output genbank file. Phold would use the VERSION id as the sequence ID. Thus, my phold output has different information with the original input sequence headers. It brings me troubles with tracing information along the analysis. Could you please help me with this problem so that I can run pharokka in the future with large amount of data without correcting the sequence headers later?

Best,
Lingyi

LOCUS       BU_D1_MACV_NODE_1129_length_20832_cov_2.052837 20832 bp    DNA     linear   PHG 21-AUG-2024
DEFINITION  BU_D1_MACV_NODE_1129_length_20832_cov_2.052837.
ACCESSION   BU_D1_MACV_NODE_1129_length_20832_cov_2
VERSION     BU_D1_MACV_NODE_1129_length_20832_cov_2.52837

### What I Did

I ran the following code and it finished successfully.

pharokka.py \
        -i ten_seqs.fasta \
        -o pharokka_output \
        -d pharokka_database \
        -t 94

there is a error message in the begnining of the run:

DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('Pharokka==1.7.3')
/miniconda3/envs/pharokka_env/lib/python3.10/site-packages/numpy/_core/getlimits.py:555: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/miniconda3/envs/pharokka_env/lib/python3.10/site-packages/numpy/_core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/miniconda3/envs/pharokka_env/lib/python3.10/site-packages/numpy/_core/getlimits.py:555: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/miniconda3/envs/pharokka_env/lib/python3.10/site-packages/numpy/_core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
@gbouras13
Copy link
Owner

Hi @lingyi-owl ,

The behaviour of the gff-gbk conversion program inside pharokka (bcbio-gff) will try and convert the pharokka gff to a genbank based on the genbank format.

https://www.ncbi.nlm.nih.gov/genbank/samplerecord/

I think there is no issue with the ACCESSION, as genbank accessions will omit everything before the first '.' .

GenBank records usually are like e.g "{ACCESSION}.{VERSION_NUMBER}"

Which is the behaviour for your contig.

VERSION is strange, it is cutting out the 0 - my guess is that is parses '052837' as '52837' (and e.g. would parse 01 as 1). I will have a look and may make an issue in bcbio-gff but I'm inclined to not make any changes regardless.

To be honest, in general, I'd advocate renaming your contigs before running pharokka from the complicated/long spades headers to something easier for software to parse.

In my opinion, the problem is really when we run something like pharokka, we are trying to do a bit too much with the GenBank format (which isn't optimal!). So that should mitigate the problem.

George

@estebano11
Copy link

Dear George,
I am having the same message, I successfully run pharokka but every time have the same message even if I just run it with -V:
pharokka.py -V
I got this message:
/home/bioinfo/.local/lib/python3.10/site-packages/numpy/_core/getlimits.py:555: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/bioinfo/.local/lib/python3.10/site-packages/numpy/_core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. return self._float_to_str(self.smallest_subnormal) /home/bioinfo/.local/lib/python3.10/site-packages/numpy/_core/getlimits.py:555: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/bioinfo/.local/lib/python3.10/site-packages/numpy/_core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero. return self._float_to_str(self.smallest_subnormal) 1.7.3
python version = 3.10.14
numpy = 2.1.1

Is something wrong in my configuration?
Thank you

Esteban

@gbouras13
Copy link
Owner

Hi @estebano11 - this is no bug in pharokka, just a warning, so it is not a big deal and I would just ignore it. There is nothing wrong with your configuration.

I am pretty sure it is caused by numpy v2. You can remove this warning by installing numpy v1 e.g. pip install numpy==1.26.4 I would assume.

George

@estebano11
Copy link

I am pretty sure it is caused by numpy v2. You can remove this warning by installing numpy v1 e.g. pip install numpy==1.26.4 I would assume.

George

Dear George,
Thank you, it solved the warming message.

Esteban

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants