Note: some codes were adopted according to the configuration environment of the Life Science Server of University of Vienna.
-
The assembly directory contains codes for trio binning, haploid genome assembly. The (pseudo)commands for executing assembly and polishing are also shwon in the directory.
-
The annotation directory contains codes for repeat annotations, transcriptome assembly (Trinity), gene model training, full-length RNA sequencing data processing, BUSCO evaluation, gene model predictions (MAKER), gene model polishing (PASApipeline), etc.
-
The Hi-C_analysese directory contains codes for Hi-C scaffolding, Hi-C heatmap visualization, A/B compartment calling, and Perl scripts for calculating the frequency of inter-chromosomal interactions (Fig. 1e). The config file for HiC-Pro is also provided.
-
The ChIP-seq contains codes for mapping and processing ChIP-seq data, and the methylation contains codes for calling 5mC DNA methylation using the Nanopore long reads.
-
The genomic_feature direcotory contains codes for comparative genomics analyses, including calculating GC content and sequencing depth (Fig. 1d), whole-genome alignment, and mapping short- and long-reads against the reference.
-
The ortholog directory contains codes for identifying orthologous groups using OrthoFinder, and an R script for ploting the igraph figures visualising the frequency of orthologous gene-pair between chicken and amphioxus chromosomes (Fig. 2a).
-
The centromere directory contains codes for identify units of tandem repeat using TideHunter, visualization tandem repeats using StainedGlass (Fig. 3b-c), and an R script for ploting the CENP-A signals and 5mC levels (Fig. 3b-c). ChiP-seq analysis for CENP-A is included in the ChIP-seq folder.
The assemblies and raw sequencing data are available under the NCBI accession PRJNA693184.