Skip to content

FCS adaptor quickstart

Eric Tvedte edited this page Nov 1, 2024 · 5 revisions

FCS-adaptor detects adaptor and vector contamination in genome sequences. This tool is one module within the NCBI Foreign Contamination Screening (FCS) program suite.

We recommend running FCS-adaptor after the initial contig assembly and on the final assembly prior to GenBank submission. If additional valid contaminants are identified in the final assembly, we recommend re-screening after contaminant removal.

FCS-adaptor operates in three main steps:

  1. BLAST alignment to reference database
  2. Generate contaminant cleaning actions
  3. Clean the genome

Quickstart

Prerequisites

  1. Docker or Singularity The current Singularity image is made using version 3.4.0.
  2. Any general-purpose host should be sufficient for execution.
  3. A genome assembly in FASTA format.

Downloading FCS-adaptor

  1. Retrieve the run_fcsadaptor.sh runner script:
    curl -LO https://github.com/ncbi/fcs/raw/main/dist/run_fcsadaptor.sh
    
  2. Change the permissions of run_fcsadaptor.sh
    chmod 755 run_fcsadaptor.sh
    
  3. If using Singularity, retrieve the FCS-adaptor .sif:
    curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/latest/fcs-adaptor.sif -Lo fcs-adaptor.sif
    

Screen the genome

  1. Set --prok (prokaryotes) or --euk (eukaryotes) depending on the source organism.
  2. Run FCS-adaptor:
    • Using Docker:
    mkdir outputdir
    ./run_fcsadaptor.sh --fasta-input h_sapiens.fa.gz --output-dir ./outputdir --euk
    
    • Using Singularity:
    mkdir outputdir
    ./run_fcsadaptor.sh --fasta-input h_sapiens.fa.gz --output-dir ./outputdir --euk --container-engine singularity --image fcs-adaptor.sif
    

Clean the genome

  1. Retrieve the fcs.py runner script:

    curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
    
  2. If using Singularity, also download the FCS-GX sif file:

    curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/latest/fcs-gx.sif -Lo fcs-gx.sif
    export FCS_DEFAULT_IMAGE=fcs-gx.sif
    
  3. Perform cleaning actions on input genome. By default this will split contigs at internal ACTION_TRIM locations. Modify the action column to FIX for locations you wish to mask instead:

    zcat h_sapiens.fa.gz | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
    

    ⚠️ FCS-adaptor currently produces a cleaned_sequences/*.fa.gz containing a cleaned FASTA where whole contaminant sequences assigned ACTION_EXCLUDE, or adaptors from ends of sequences assigned ACTION_TRIM are removed. Internal adaptor sequences are not automatically cleaned by run_fcsadaptor.sh. When internal adaptor hits are present, users are responsible for determining whether splitting or masking on internal adaptors with fcs.py clean genome is more appropriate. See Interpreting Outputs for additional information.

Usage Examples

Test that FCS-adaptor is operating normally on a small FASTA file:

  1. Download the test FASTA:

    curl -LO https://zenodo.org/records/10932013/files/FCS_combo_test.fa
    
  2. Screen the genome:

    mkdir outputdir
    ./run_fcsadaptor.sh --fasta-input FCS_combo_test.fa --output-dir ./outputdir --euk
    

    A successful FCS-adaptor run will print the log to console, ending with:

    [workflow ] completed success
    Output will be placed in: /output-volume
    Executing the workflow
    run_av_screen_x
    run_av_screen_x
    

    The output directory will contain the following files:

    cleaned_sequences/FCS_combo_test.fa
    combined.calls.jsonl
    fcs.log
    fcs_adaptor.log
    fcs_adaptor_report.txt
    logs.jsonl
    pipeline_args.yaml
    skipped_trims.jsonl
    validate_fasta.txt
    

    The output from this example fcs_adaptor_report.txt should match this file.

  3. Clean the genome:

    cat FCS_combo_test.fa | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
    

    By default this will trim 5 sequences (terminal ACTION_TRIM), split 5 sequences (internal ACTION_TRIM), and exclude 1 sequence (ACTION_EXCLUDE):

    Applied 11 actions; 522 bps dropped; 0 bps hardmasked.
    

    Confirm the cleaning actions:

    grep seq_00001 fcs_adaptor_report.txt
    seq_00001       230276  ACTION_TRIM     1..58   CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer
    
    grep seq_00001 clean.fasta
    >seq_00001~59..230276
    
    grep seq_00006 fcs_adaptor_report.txt
    seq_00006       270219  ACTION_TRIM     100001..100058  CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer
    
    grep seq_00006 clean.fasta
    >seq_00006~100059..270219
    >seq_00006~1..100000