Merge branch 'dev' into merging-template-3.11

nf-core · Jan 8, 2025 · 7fa318e · 7fa318e
2 parents 5b4bca5 + 1f8f208
commit 7fa318e
Show file tree

Hide file tree

Showing 31 changed files with 217 additions and 81 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -43,6 +43,7 @@ jobs:
           - "test_colabfold_webserver"
           - "test_colabfold_download"
           - "test_esmfold"
+          - "test_split_fasta"
         isMaster:
           - ${{ github.base_ref == 'master' }}
         # Exclude conda and singularity on dev

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,14 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Enhancements & fixes
 
-- [[#177](https://github.com/nf-core/proteinfold/issues/177)]- Fix typo in some instances of model preset `alphafold2_ptm`.
+- [[#177](https://github.com/nf-core/proteinfold/issues/177)] - Fix typo in some instances of model preset `alphafold2_ptm`.
 - [[PR #178](https://github.com/nf-core/proteinfold/pull/178)] - Enable running multiple modes in parallel.
-- [[#179](https://github.com/nf-core/proteinfold/issues/179)]- Produce an interactive html report for the predicted structures.
-- [[#180](https://github.com/nf-core/proteinfold/issues/180)]- Implement Fooldseek.
-- [[#188](https://github.com/nf-core/proteinfold/issues/188)]- Fix colabfold image to run in gpus.
+- [[#179](https://github.com/nf-core/proteinfold/issues/179)] - Produce an interactive html report for the predicted structures.
+- [[#180](https://github.com/nf-core/proteinfold/issues/180)] - Implement Fooldseek.
+- [[#188](https://github.com/nf-core/proteinfold/issues/188)] - Fix colabfold image to run in gpus.
 - [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`.
-- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)]- Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
-- [[#214](https://github.com/nf-core/proteinfold/issues/214)]- Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
+- [[PR #210](https://github.com/nf-core/proteinfold/pull/210)] - Moving post-processing logic to a subworkflow, change wave images pointing to oras to point to https and refactor module to match nf-core folder structure.
+- [[#214](https://github.com/nf-core/proteinfold/issues/214)] - Fix colabfold image to run in cpus after [#188](https://github.com/nf-core/proteinfold/issues/188) fix.
+- [[#235](https://github.com/nf-core/proteinfold/issues/235)] - Update samplesheet to new version (switch from `sequence` column to `id`).
 
 ## [[1.1.1](https://github.com/nf-core/proteinfold/releases/tag/1.1.1)] - 2025-07-30
 

diff --git a/assets/comparison_template.html b/assets/comparison_template.html
diff --git a/assets/report_template.html b/assets/report_template.html
diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv
@@ -1,3 +1,3 @@
-sequence,fasta
+id,fasta
 T1024,https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/T1024.fasta
 T1026,https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/T1026.fasta
diff --git a/bin/generate_comparison_report.py b/bin/generate_comparison_report.py
@@ -50,7 +50,7 @@ def generate_output(plddt_data, name, out_dir, generate_tsv, pdb):
             linecolor="black",
             gridcolor="WhiteSmoke",
         ),
-        legend=dict(y=0, x=1),
+        legend=dict(yanchor="bottom", y=0.02, xanchor="right", x=1, bordercolor="Black", borderwidth=1),
         plot_bgcolor="white",
         width=600,
         height=600,

diff --git a/bin/generate_report.py b/bin/generate_report.py
@@ -120,7 +120,7 @@ def generate_output_images(msa_path, plddt_data, name, out_dir, in_type, generat
             linecolor="black",
             gridcolor="WhiteSmoke",
         ),
-        legend=dict(yanchor="bottom", y=0, xanchor="right", x=1.3),
+        legend=dict(yanchor="bottom", y=0.02, xanchor="right", x=1, bordercolor="Black", borderwidth=1),
         plot_bgcolor="white",
         width=600,
         height=600,

diff --git a/conf/modules_alphafold2.config b/conf/modules_alphafold2.config
@@ -40,9 +40,18 @@ if (params.alphafold2_mode == 'standard') {
                 params.max_template_date ? "--max_template_date ${params.max_template_date}" : ''
             ].join(' ').trim()
             publishDir = [
-                path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" },
-                mode: 'copy',
-                saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                [
+                    path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" },
+                    mode: 'copy',
+                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                    pattern: '*.*'
+                ],
+                [
+                    path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}/top_ranked_structures" },
+                    mode: 'copy',
+                    saveAs: { "${meta.id}.pdb" },
+                    pattern: '*_alphafold2.pdb'
+                ]
             ]
         }
     }
@@ -54,7 +63,7 @@ if (params.alphafold2_mode == 'split_msa_prediction') {
         withName: 'RUN_ALPHAFOLD2_MSA' {
             ext.args =  params.max_template_date ? "--max_template_date ${params.max_template_date}" : ''
             publishDir = [
-                path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" },
+                path: { "${params.outdir}/alphafold2_${params.alphafold2_mode}" },
                 mode: 'copy',
                 saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
             ]
@@ -64,9 +73,18 @@ if (params.alphafold2_mode == 'split_msa_prediction') {
             if(params.use_gpu) { accelerator = 1 }
             ext.args   = params.use_gpu ? '--use_gpu_relax=true' : '--use_gpu_relax=false'
             publishDir = [
-                path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" },
-                mode: 'copy',
-                saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
+                [
+                    path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" },
+                    mode: 'copy',
+                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                    pattern: '*.*'
+                ],
+                [
+                    path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}/top_ranked_structures" },
+                    mode: 'copy',
+                    saveAs: { "${meta.id}.pdb" },
+                    pattern: '*_alphafold2.pdb'
+                ]
             ]
         }
     }

diff --git a/conf/modules_colabfold.config b/conf/modules_colabfold.config
@@ -30,10 +30,18 @@ if (params.colabfold_server == 'webserver') {
                 params.host_url ? "--host-url ${params.host_url}" : ''
             ].join(' ').trim()
             publishDir = [
-                path: { "${params.outdir}/colabfold/${params.colabfold_server}" },
-                mode: 'copy',
-                saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
-                pattern: '*.*'
+                [
+                    path: { "${params.outdir}/colabfold/${params.colabfold_server}" },
+                    mode: 'copy',
+                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                    pattern: '*.*'
+                ],
+                [
+                    path: { "${params.outdir}/colabfold/${params.colabfold_server}/top_ranked_structures" },
+                    mode: 'copy',
+                    saveAs: { "${meta.id}.pdb" },
+                    pattern: '*_relaxed_rank_001*.pdb'
+                ]
             ]
         }
     }
@@ -67,10 +75,18 @@ if (params.colabfold_server == 'local') {
                 params.use_templates ? '--templates' : ''
             ].join(' ').trim()
             publishDir = [
-                path: { "${params.outdir}/colabfold/${params.colabfold_server}" },
-                mode: 'copy',
-                saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
-                pattern: '*.*'
+                [
+                    path: { "${params.outdir}/colabfold/${params.colabfold_server}" },
+                    mode: 'copy',
+                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
+                    pattern: '*.*'
+                ],
+                [
+                    path: { "${params.outdir}/colabfold/${params.colabfold_server}/top_ranked_structures" },
+                    mode: 'copy',
+                    saveAs: { "${meta.id}.pdb" },
+                    pattern: '*_relaxed_rank_001*.pdb'
+                ],
             ]
         }
     }

diff --git a/conf/modules_esmfold.config b/conf/modules_esmfold.config
@@ -14,11 +14,19 @@ process {
     withName: 'RUN_ESMFOLD' {
         ext.args = {params.use_gpu ? '' : '--cpu-only'}
         publishDir = [
-                path: { "${params.outdir}/esmfold" },
+            [
+                path: { "${params.outdir}/esmfold/default" },
                 mode: 'copy',
                 saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
                 pattern: '*.*'
+            ],
+            [
+                path: { "${params.outdir}/esmfold/default/top_ranked_structures" },
+                mode: 'copy',
+                saveAs: { "${meta.id}.pdb" },
+                pattern: '*.pdb'
             ]
+        ]
     }
 
     withName: 'NFCORE_PROTEINFOLD:ESMFOLD:MULTIQC' {

diff --git a/conf/test.config b/conf/test.config
@@ -28,7 +28,7 @@ params {
     // Input data to test alphafold2 analysis
     mode            = 'alphafold2'
     alphafold2_mode = 'standard'
-    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     alphafold2_db   = "${projectDir}/assets/dummy_db_dir"
 }
 

diff --git a/conf/test_alphafold_download.config b/conf/test_alphafold_download.config
@@ -28,7 +28,7 @@ params {
     // Input data to test alphafold2 analysis
     mode            = 'alphafold2'
     alphafold2_mode = 'standard'
-    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
 }
 
 process {

diff --git a/conf/test_alphafold_split.config b/conf/test_alphafold_split.config
@@ -28,7 +28,7 @@ params {
     // Input data to test alphafold2 splitting MSA from prediction analysis
     mode            = 'alphafold2'
     alphafold2_mode = 'split_msa_prediction'
-    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     alphafold2_db   = "${projectDir}/assets/dummy_db_dir"
 }
 

diff --git a/conf/test_colabfold_download.config b/conf/test_colabfold_download.config
@@ -28,7 +28,7 @@ params {
     // Input data to test colabfold analysis
     mode             = 'colabfold'
     colabfold_server = 'webserver'
-    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
 }
 
 process {

diff --git a/conf/test_colabfold_local.config b/conf/test_colabfold_local.config
@@ -27,7 +27,7 @@ params {
     mode             = 'colabfold'
     colabfold_server = 'local'
     colabfold_db     = "${projectDir}/assets/dummy_db_dir"
-    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
 }
 
 process {

diff --git a/conf/test_colabfold_webserver.config b/conf/test_colabfold_webserver.config
@@ -27,7 +27,7 @@ params {
     mode             = 'colabfold'
     colabfold_server = 'webserver'
     colabfold_db     = "${projectDir}/assets/dummy_db_dir"
-    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
 }
 
 process {

diff --git a/conf/test_esmfold.config b/conf/test_esmfold.config
@@ -26,7 +26,7 @@ params {
     // Input data to test esmfold
     mode             = 'esmfold'
     esmfold_db       = "${projectDir}/assets/dummy_db_dir"
-    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
 }
 
 process {

diff --git a/conf/test_full.config b/conf/test_full.config
@@ -17,6 +17,6 @@ params {
     // Input data for full test of alphafold standard mode
     mode            = 'alphafold2'
     alphafold2_mode = 'standard'
-    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     alphafold2_db   = 's3://proteinfold-dataset/test-data/db/alphafold_mini'
 }
diff --git a/conf/test_full_alphafold_multimer.config b/conf/test_full_alphafold_multimer.config
@@ -18,6 +18,6 @@ params {
     mode                    = 'alphafold2'
     alphafold2_mode         = 'standard'
     alphafold2_model_preset = 'multimer'
-    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet_multimer.csv'
+    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet_multimer.csv'
     alphafold2_db           = 's3://proteinfold-dataset/test-data/db/alphafold_mini'
 }
diff --git a/conf/test_full_alphafold_split.config b/conf/test_full_alphafold_split.config
@@ -17,6 +17,6 @@ params {
     // Input data to test colabfold with a local server analysis
     mode            = 'alphafold2'
     alphafold2_mode = 'split_msa_prediction'
-    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input           = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     alphafold2_db   = 's3://proteinfold-dataset/test-data/db/alphafold_mini'
 }
diff --git a/conf/test_full_colabfold_local.config b/conf/test_full_colabfold_local.config
@@ -19,7 +19,7 @@ params {
     mode                    = 'colabfold'
     colabfold_server        = 'local'
     colabfold_model_preset  = 'alphafold2_ptm'
-    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     colabfold_db            = 's3://proteinfold-dataset/test-data/db/colabfold_mini'
 }
 process {

diff --git a/conf/test_full_colabfold_webserver.config b/conf/test_full_colabfold_webserver.config
@@ -18,6 +18,6 @@ params {
     mode                   = 'colabfold'
     colabfold_server       = 'webserver'
     colabfold_model_preset = 'alphafold2_ptm'
-    input                  = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input                  = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     colabfold_db            = 's3://proteinfold-dataset/test-data/db/colabfold_mini'
 }
diff --git a/conf/test_full_colabfold_webserver_multimer.config b/conf/test_full_colabfold_webserver_multimer.config
@@ -18,6 +18,6 @@ params {
     mode                   = 'colabfold'
     colabfold_server       = 'webserver'
     colabfold_model_preset = 'alphafold2_multimer_v3'
-    input                  = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet_multimer.csv'
+    input                  = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet_multimer.csv'
     colabfold_db           = 's3://proteinfold-dataset/test-data/db/colabfold_mini'
 }
diff --git a/conf/test_full_esmfold.config b/conf/test_full_esmfold.config
@@ -17,6 +17,6 @@ params {
     // Input data for full test of esmfold monomer
     mode                    = 'esmfold'
     esmfold_model_preset    = 'monomer'
-    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet.csv'
+    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet.csv'
     esmfold_db              = 's3://proteinfold-dataset/db/esmfold'
 }
diff --git a/conf/test_full_esmfold_multimer.config b/conf/test_full_esmfold_multimer.config
@@ -17,6 +17,6 @@ params {
     // Input data for full test of esmfold multimer
     mode                    = 'esmfold'
     esmfold_model_preset    = 'multimer'
-    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.0/samplesheet_multimer.csv'
+    input                   = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet_multimer.csv'
     esmfold_db              = 's3://proteinfold-dataset/test-data/db/esmfold'
 }
diff --git a/conf/test_split_fasta.config b/conf/test_split_fasta.config
@@ -0,0 +1,38 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+    Use as follows:
+        nextflow run nf-core/proteinfold -profile test_colabfold_local,<docker/singularity> --outdir <OUTDIR>
+----------------------------------------------------------------------------------------
+*/
+
+stubRun = true
+
+// Limit resources so that this can run on GitHub Actions
+process {
+    resourceLimits = [
+        cpus: 4,
+        memory: '15.GB',
+        time: '1.h'
+    ]
+}
+
+params {
+    config_profile_name        = 'Test profile'
+    config_profile_description = 'Minimal test dataset to check pipeline function'
+
+    // Input data to test colabfold with the colabfold webserver analysis
+    mode             = 'colabfold'
+    colabfold_server = 'local'
+    split_fasta      = true
+    colabfold_db     = "${projectDir}/assets/dummy_db_dir"
+    input            = params.pipelines_testdata_base_path + 'proteinfold/testdata/samplesheet/v1.2/samplesheet_multimer.csv'
+}
+
+process {
+    withName: 'MMSEQS_COLABFOLDSEARCH|COLABFOLD_BATCH' {
+        container = 'biocontainers/gawk:5.1.0'
+    }
+}
diff --git a/docs/output.md b/docs/output.md
@@ -23,10 +23,8 @@ The directories listed below will be created in the output directory after the p
 <details markdown="1">
 <summary>Output files</summary>
 
-- `AlphaFold2/`
-  - `<SEQUENCE NAME>/` that contains the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings
-  - `<SEQUENCE NAME>.alphafold.pdb` that is the structure with the highest pLDDT score (ranked first)
-  - `<SEQUENCE NAME>_plddt_mqc.tsv` that presents the pLDDT scores per residue for each of the 5 predicted models
+- `alphafold2/standard/` or `alphafold2/split_msa_prediction/` based on the selected mode. It contains the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings. Specifically, `<SEQUENCE NAME>_plddt_mqc.tsv` presents the pLDDT scores per residue for each of the 5 predicted models.
+  - `top_ranked_structures/<SEQUENCE NAME>.pdb` that is the structure with the highest pLDDT score per input (ranked first)
 - `DBs/` that contains symbolic links to the downloaded database and parameter files
 
 </details>
@@ -91,7 +89,8 @@ Below you can find an indicative example of the TSV file with the pLDDT scores p
 <details markdown="1">
 <summary>Output files</summary>
 
-- `colabfold/webserver/` or `colabfold/local/` based on the selected mode that contains the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs and scores, prediction metadata, logs and section timings
+- `colabfold/webserver/` or `colabfold/local/` based on the selected mode. It contains the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings. Specifically, `<SEQUENCE NAME>_plddt_mqc.tsv` presents the pLDDT scores per residue for each of the 5 predicted models.
+  - `top_ranked_structures/<SEQUENCE NAME>.pdb` that is the structure with the highest pLDDT score per input (ranked first)
 - `DBs/` that contains symbolic links to the downloaded database and parameter files
 
 </details>
@@ -115,9 +114,9 @@ Below you can find some indicative examples of the output images produced by Col
 <details markdown="1">
 <summary>Output files</summary>
 
-- `esmfold/`
-  - `<SEQUENCE NAME>.pdb` that is the structure with the highest pLDDT score (ranked first)
-  - `<SEQUENCE NAME>_plddt_mqc.tsv` that presents the pLDDT scores per residue for each of the 5 predicted models
+- `esmfold/default`
+  contains the predicted structures. Specifically, `<SEQUENCE NAME>_plddt_mqc.tsv` presents the pLDDT scores per residue for each of the predicted models.
+  - `top_ranked_structures/<SEQUENCE NAME>.pdb` that is the structure with the highest pLDDT score per input (ranked first)
 - `DBs/` that contains symbolic links to the downloaded database and parameter files
 
 </details>