pharokka protein crashed after completing mmseqs searches #300

luisalbertoc95 · 2023-11-01T16:29:54Z

pharokka version:1.4 & 1.5.1
Python version: Python 3.10.8
Operating System: Rocky Linux 8.7 (Green Obsidian)

Description

Hi @gbouras13, When trying to run pharokka_proteins.py in a set of 755001 ORFs I'm having an error due to a mismatch in lengths between the keys and columns in the pandas DataFrame. According to the log file, all mmseqs searches were completed.

Thank you!

What I Did

Command run: 

pharokka_proteins.py -i ${WD}/out.CAT.predicted_proteins.faa  \
-o ${WD}/pharokka_prot_out_assembly_1Kb_NoPhablesContigs_PhablesresolvedGenomes \
-d /ref/sahlab/data/viral_analysis_DBs/pharokka1.5_DBs \
-t 24 \
-e 1E-03 \
--force

Traceback: 
2023-10-31 21:26:34.164 | INFO     | post_processing:process_vfdb_results:2134 - Processing VFDB output.
2023-10-31 21:26:35.099 | INFO     | post_processing:process_vfdb_results:2197 - 46 VFDB virulence factors identified.
Traceback (most recent call last):
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/bin/pharokka_proteins.py", line 213, in <module>
    main()
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/bin/pharokka_proteins.py", line 172, in main
    pharok.process_dataframes()
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/bin/proteins.py", line 526, in process_dataframes
    (tophits_df, vfdb_results) = process_vfdb_results(self.out_dir, tophits_df)
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/bin/post_processing.py", line 2198, in process_vfdb_results
    merged_df[["genbank", "desc_tmp", "vfdb_species"]] = merged_df[
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4082, in __setitem__
    self._setitem_array(key, value)
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/lib/python3.10/site-packages/pandas/core/frame.py", line 4124, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/ref/sahlab/software/anaconda3/envs/pharokka1.5_env/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

pharokka_proteins_1698789518.5425682.log

The text was updated successfully, but these errors were encountered:

gbouras13 · 2023-11-01T23:01:18Z

Hi @luisalbertoc95 ,

Thanks for reporting this bug and using Pharokka! I see you're using Phables too :)

I'm pretty sure this has to do with the VFDB naming (it's annoying :) ).

Would you be able to do a few things:

I'd upgrade to 1.5.1 regardless (that log is from v1.4.0).
Re-run this with --hmm_only. It should work to get all the PHROG annotations, but it will skip CARD and VFDB steps. So do that if you're in a hurry.
I'm sure you want the CARD and VFDB steps too, so would you be able to send me the VFDB output? In particular vfdb_results.tsv. [email protected] (it should be small enough to email or attach here). I'm pretty sure it's because one of the VFDB outputs has a strange character and if so I will implement a fix soon once I can replicate the error.

George

luisalbertoc95 · 2023-11-03T15:17:09Z

Hi George,

Thanks a lot for you suggestions. Running the code with --hmm_only worked! I'll send the vfdb_results.tsv to you.

Thank you,

Luis

gbouras13 · 2024-01-10T02:29:13Z

Hi @luisalbertoc95 ,

It took a while but I solved this error - it was a bug in pharokka to do with matching VFDB and other outputs.

If you re-run pharokka now it should work (but seemingly you were happy enough with --hmm_only so maybe you've moved on)

George

ebueren · 2024-01-23T02:16:46Z

Hello! I'm running pharokka 1.6.1 (fresh env and database install), and still receiving the same error (below). Running in --fast mode fixes the problem, so I think it seems like it has to do with the VFDB/CARD databases.

Pharokka version: 1.6.1
Python 3.10.8
OS: Linux, 3.10.0

Command:
pharokka.py -i file.fna -f -o test.out -d /x/x/x/pharokka_db/ -t 32 -m -g prodigal --skip_mash


2024-01-22 20:59:20.921 | INFO     | __main__:main:379 - Post Processing Output.
2024-01-22 20:59:23.455 | INFO     | post_processing:create_mmseqs_tophits:2104 - Processing MMseqs2 outputs.
2024-01-22 20:59:23.455 | INFO     | post_processing:create_mmseqs_tophits:2105 - Processing PHROGs output.
2024-01-22 20:59:30.113 | INFO     | post_processing:process_vfdb_results:2309 - Processing VFDB output.
2024-01-22 20:59:30.149 | INFO     | post_processing:process_vfdb_results:2368 - 17 VFDB virulence factors identified.
Traceback (most recent call last):
  File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/pharokka.py", line 499, in <module>
    main()
  File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/pharokka.py", line 418, in main
    pharok.process_results()
  File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/post_processing.py", line 356, in process_results
    (merged_df, vfdb_results) = process_vfdb_results(
  File "/home/ebueren/miniconda3/envs/pharokka1.6/bin/post_processing.py", line 2369, in process_vfdb_results
    merged_df[["genbank", "desc_tmp", "vfdb_species"]] = merged_df[
  File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/frame.py", line 4287, in __setitem__
    self._setitem_array(key, value)
  File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/frame.py", line 4329, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/home/ebueren/miniconda3/envs/pharokka1.6/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

fluhus · 2024-05-23T07:36:36Z

Hi, I am having this issue as well on a fresh mamba+pharokka (1.7.1) install.

pharokka.py -i vir.fa -o vir.prk -d ~/data/pharokka

Same error. Adding --hmm_only or --fast did not help. Happy to provide additional information that could help debug this!

gbouras13 · 2024-05-23T07:56:23Z

Hi @fluhus ,

how big is your input? Is it very small? I have a feeling this error may be because MMseqs2 found no hits at all. I’ll try and replicate later this week and put in a fix if so.

george

fluhus · 2024-05-23T08:11:28Z

Thanks for the quick response!

Here is the input file (111K unzipped):

vir.fa.gz

gbouras13 · 2024-05-23T12:29:38Z

Hi @fluhus,

I have narrowed down your error to the '#' in the header. If you remove this it will work. I'll put in a bug fix at some point :)

George

fluhus · 2024-05-24T07:48:22Z

Thanks for looking into this! I removed the # signs from the names and now it runs :)

gbouras13 added the bug Something isn't working label Nov 1, 2023

gbouras13 closed this as completed in d79e1d9 Jan 10, 2024

gbouras13 reopened this Jan 23, 2024

iaindhay mentioned this issue Feb 7, 2024

Pharokka stuck at running mmseqs search #325

Open

gbouras13 added a commit that referenced this issue May 23, 2024

fix error where # in header #300

d167603

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pharokka protein crashed after completing mmseqs searches #300

pharokka protein crashed after completing mmseqs searches #300

luisalbertoc95 commented Nov 1, 2023 •

edited

Loading

gbouras13 commented Nov 1, 2023

luisalbertoc95 commented Nov 3, 2023

gbouras13 commented Jan 10, 2024

ebueren commented Jan 23, 2024

fluhus commented May 23, 2024

gbouras13 commented May 23, 2024

fluhus commented May 23, 2024

gbouras13 commented May 23, 2024

fluhus commented May 24, 2024

pharokka protein crashed after completing mmseqs searches #300

pharokka protein crashed after completing mmseqs searches #300

Comments

luisalbertoc95 commented Nov 1, 2023 • edited Loading

Description

What I Did

gbouras13 commented Nov 1, 2023

luisalbertoc95 commented Nov 3, 2023

gbouras13 commented Jan 10, 2024

ebueren commented Jan 23, 2024

fluhus commented May 23, 2024

gbouras13 commented May 23, 2024

fluhus commented May 23, 2024

gbouras13 commented May 23, 2024

fluhus commented May 24, 2024

luisalbertoc95 commented Nov 1, 2023 •

edited

Loading