Skip to content

FastOMA nextflow pipeline

Sina Majidian edited this page Feb 4, 2025 · 3 revisions
===========================================
      FastOMA -- PIPELINE
    ===========================================
    Usage:
    Run the pipeline with default parameters:
    nexflow run FastOMA.nf

    Run with user parameters:
    nextflow run FastOMA.nf --input_folder {input.dir}  --output_folder {results.dir}

    Mandatory arguments:
        --input_folder          Input data folder. Defaults to ${params.input_folder}. This folder
                                must contain the proteomes (in a subfolder named 'proteome') and
                                a species tree file. Optionally the folder might contain
                                 - a sub-folder 'splice' containing splicing variant mappings
                                 - a sub-folder 'hogmap_in' containing precomputed OMAmer
                                   placement results for all proteomes

                                All sub-folders and sub-files can also be placed in orther
                                locations if you provide alternative values for them (see below on
                                optional arguments section).

        --output_folder         Path where all the output should be stored. Defaults to
                                ${params.output_folder}


    Profile selection:
        -profile                FastOMA can be run using several execution profiles. The default
                                set of available profiles is
                                 - docker       Run pipeline using docker containers. Docker needs
                                                to be installed on your system. Containers will be
                                                fetched automatically from dockerhub. See also
                                                additional options '--container_version' and
                                                '--container_name'.

                                 - singlularity Run pipeline using singularity. Singularity needs
                                                to be installed on your system. On HPC clusters,
                                                it often needs to be loaded as a seperate module.
                                                Containers will be fetched automatically from
                                                dockerhub. See also additional options
                                                '--container_version' and '--container_name'.

                                 - conda        Run pipeline in a conda environment. Conda needs
                                                to be installed on your system. The environment
                                                will be created automatically.

                                 - standard     Run pipeline on your local system. Mainly intended
                                                for development purpose. All dependencies must be
                                                installed in the calling environment.

                                 - slurm_singularity
                                                Run pipeline using SLURM job scheduler and
                                                singularity containers. This profile can also be a
                                                template for other HPC clusters that use different
                                                schedulers.

                                 - slurm_conda  Run pipeline using SLURM job scheduler and conda
                                                environment.

                                Profiles are defined in nextflow.config and can be extended or
                                adjusted according to your needs.


    Additional options:
        --proteome_folder       Overwrite location of proteomes (default ${params.proteome_folder})
        --species_tree          Overwrite location of species tree file (newick format).
                                Defaults to ${params.species_tree}
        --splice_folder         Overwrite location of splice file folder. The splice files must be
                                named <proteome_file>.splice.
                                Defaults to ${params.splice_folder}
        --omamer_db             Path or URL to download the OMAmer database from.
                                Defaults to ${params.omamer_db}
        --hogmap_in             Optional path where precomputed omamer mapping files are located.
                                Defaults to ${params.hogmap_in}
        --fasta_header_id_transformer
                                choice of transformers of input proteome fasta header
                                to reported IDs in output files (e.g. orthoxml files)
                                Defaults to '${params.fasta_header_id_transformer}', and can be set to
                                  - noop         : no transformation (input header == output header)
                                  - UniProt      : extract accession from uniprot header
                                                   e.g. '>sp|P68250|1433B_BOVIN' --> 'P68250'

    Algorithmic parameters:
        --nr_repr_per_hog       The maximum number of representatives per subhog to keep during the
                                inference. Higher values lead to slighlty higher runtime.
                                Default to ${params.nr_repr_per_hog}.
        --filter_method         The applied filtering method on the MSAs before tree building.
                                must be one of "col-row-threshold", "col-elbow-row-threshold", "trimal".
                                Defaults to ${params.filter_method}.
        --min_sequence_length   Minimum length of a sequence to be considered for orthology
                                inference. Too short sequences tend to be problematic.
                                Defaults to ${params.min_sequence_length}.


    Flags:
        --help                  Display this message
        --debug_enabled         Store addtional information that might be helpful to debug in case
                                of a problem with FastOMA.
        --write_msas            MSAs used during inference of subhogs will be stored at
                                every taxonomic level.
        --write_genetrees       Inferred gene trees will be stored at every taxonomic level.
        --force_pairwise_ortholog_generation
                                Force producing the pairwise orthologs.tsv.gz file even if the
                                dataset contains many proteomes. By default, FastOMA produces the
                                pairwise ortholog file only if there are at most 25 proteomes in
                                the dataset.
        --report                Produce nextflow report and timeline and store in in
                                $params.statdir