NCBI now accepts diploid genome submissions! This repository contains a collection of scripts to aid you in generating the placement file required by NCBI. Details can be found here. Refer to the Arabidopsis thaliana assembly from Chin et al. 2016 as you prepare your FALCON-Unzip assembly for submission to NCBI.
The placement file for an unscaffolded assembly has the following fields:
- alt_asm_name: name of the assembly-unit that includes the alternate scaffold.
- prim_asm_name: name of the assembly-unit on which the alternate scaffold is being placed. Expected to be 'Primary Assembly' in most cases.
- alt_scaf_name: name of the alternate scaffold being placed
- parent_type: type of object on which the alternate scaffold is being placed, either CHROMOSOME or SCAFFOLD
- parent_name: name of the object on which the alternate scaffold is being placed (can be either a chromosome or a scaffold)
- ori: orientation of the alignment, '+', '-', 'b'(mixed)
- alt_scaf_start: start of the placement on the alternate scaffold (in 1 base coordinates)
- alt_scaf_stop: end of the placement on the alternate scaffold (in 1 base coordinates)
- parent_start: start of the placement on the parent sequence (in 1 base coordinates)
- parent_stop: end of the placement on the parent sequence (in 1 base coordinates)
- alt_start_tail: number of bases at the start of the alternate scaffold not involved in the placement
- alt_stop_tail: number of bases at the end of the alternate scaffold not involved in the placement
Refer to AthalPlacementFile.txt in this repository for the Arabidpsis thaliana file.
Either of the following scripts can be used to generate a placement file.
Python script written by Sarah Kingan that generates placement file for diploid genome assembly. This script operates on a directory containing "coords" files created from nucmer alignments between each haplotig and its associated primary contig. The commands to generate "coords" files can be found within script on lines 23-43.
python script written by Jason Chin that performs nucmer alignments, generates coords file, and created placement file for NCBI submission. Dependencies include FastaReader.py and mummer.
THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.