Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GenoFLU dependencies #243

Merged
merged 1 commit into from
Feb 20, 2025
Merged

Add GenoFLU dependencies #243

merged 1 commit into from
Feb 20, 2025

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Feb 20, 2025

Description of proposed changes

Add dependencies for GenoFLU to run in avian-flu:

  • ncbi-blast+
  • openpyxl (for pandas.read_excel)

Related issue(s)

Resolves #242

Checklist

  • Checks pass

Add dependencies for GenoFLU to run in avian-flu:
- ncbi-blast+
- openpyxl (for pandas.read_excel)

Resolves <#242>
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I haven't tested, but I can if it'd be helpful.

@joverlee521
Copy link
Contributor Author

Awesome! I haven't tested, but I can if it'd be helpful.

Thanks! I left the example command in nextstrain/avian-flu#127 (comment).

@@ -326,6 +330,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
less \
libgomp1 \
libsqlite3-0 \
ncbi-blast+ \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just caught up on conversation in #127.

Worth noting this is installing https://packages.debian.org/bookworm/ncbi-blast+ which is v2.12.0.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this via the nextstrain/base:branch-genoflu-deps image and all works. As a bonus it's faster than the homebrew MacOS blast!

LGTM

@joverlee521 joverlee521 merged commit 2bea1bf into master Feb 20, 2025
61 checks passed
@joverlee521 joverlee521 deleted the genoflu-deps branch February 20, 2025 21:48
@joverlee521
Copy link
Contributor Author

Re: whether this significantly increases the image size:

os/arch latest size branch size
linux/amd64 704.38 MB 728.54 MB
linux/arm64 686.68 MB 710.64 MB

@jameshadfield
Copy link
Member

Pasting here the error you get if you try to run GenoFLU (which needs the blast tools added in this PR) with an older docker image:

        python ./vendored-GenoFLU-multi/bin/genoflu-multi.py             -f fauna/data/genoflu/             -n 1 > fauna/logs/run_genoflu.txt
        
cat: write error: Broken pipe
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/nextstrain/build/ingest/./vendored-GenoFLU-multi/bin/genoflu-multi.py", line 41, in run_genoflu
    genoflu.blast_hpai_genomes()
  File "/nextstrain/build/ingest/vendored-GenoFLU-multi/bin/genoflu.py", line 202, in blast_hpai_genomes
    blast_hpai_genotyping = Blast_Fasta(
  File "/nextstrain/build/ingest/vendored-GenoFLU-multi/bin/genoflu.py", line 70, in __init__
    with open(blastout_file, 'r') as blast_file:
FileNotFoundError: [Errno 2] No such file or directory: 'fauna/data/genoflu/temp/1/temp_blast_out.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nextstrain/build/ingest/./vendored-GenoFLU-multi/bin/genoflu-multi.py", line 197, in <module>
    pool_data = pool.starmap(run_genoflu, zip(split_strain_records, range(1,cores+1)))
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'fauna/data/genoflu/temp/1/temp_blast_out.txt'
[Sun Feb 23 22:23:03 2025]
Error in rule run_genoflu:
    jobid: 20
    input: fauna/data/genoflu/sequences_pb2.fasta, fauna/data/genoflu/sequences_pb1.fasta, fauna/data/genoflu/sequences_pa.fasta, fauna/data/genoflu/sequences_ha.fasta, fauna/data/genoflu/sequences_np.fasta, fauna/data/genoflu/sequences_na.fasta, fauna/data/genoflu/sequences_mp.fasta, fauna/data/genoflu/sequences_ns.fasta
    output: fauna/data/genoflu/results/results.tsv
    log: fauna/logs/run_genoflu.txt (check log file(s) for error details)
    shell:
        
        python ./vendored-GenoFLU-multi/bin/genoflu-multi.py             -f fauna/data/genoflu/             -n 1 > fauna/logs/run_genoflu.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dependencies for GenoFLU to runtime
3 participants