Skip to content
Rasmus Kirkegaard edited this page Nov 24, 2016 · 13 revisions

Generate data (Linux environment)


Input(s): x* sequencing read pairs


  • Perform co-assembly of all samples
    • Clean up the generated fasta file so that headers only have the contig number e.g. ">1" for contig number 1
  • Map reads from each sample to the assembly.fa
    • Generate a simple text file with two columns: "contig_ID" and "average_coverage"
    • Export create a .sam file for generating network files (optional)
  • Extract 16S and 23S rRNA genes by running shell script "rRNA.sh" (Get SSU database here) (optional)
    • Classify the fasta files using Silva aligner
    • Download the generated files as ".csv" and rename to 16S.csv and 23S.csv
  • Generate remaining data by running shell script "data.generation.2.1.0.sh" (optional)

Extract genome bins (Run in R)


Input(s): 1* assembly.fasta, x* coverage files, 1* essential.txt (optional), 1* tax.txt (optional)

16S.csv (optional) , 23S.csv (optional) , paired-end network(s) (optional)


  • Prepare a "Load_data.rmd" by updating with relevant file names and column names for "contig_ID" and "average_coverage"
  • Begin developing your "Genome_extraction.rmd" for reproducible genome extraction

Clone this wiki locally