GitHub - kleempoel/Target_Recovery_From_Assemblies: Recover 353 targets from assemblies (unannotated genomes)

Snakemake pipeline used to recover Angiosperms 353 genes from public, unannotated, assemblies.

Assemblies preparation

Make a text file of assemblies accession IDs, one by line, and name it AssembliesID_ls.txt

Use the following code to download and process the sequences into a fasta file

ncpu=4
while read accession; do
	echo $accession
	
	./datasets download assembly $accession --filename $accession.zip
	unzip $accession.zip -d $accession
	cat $accession/ncbi_dataset/data/*/*.fna > in_fasta/$accession.fasta

	seqkit stats -j $ncpu in_fasta/$accession.fasta
	rm -r $accession
	rm $accession.zip
	makeblastdb -in in_fasta/$accession.fasta -dbtype nucl -parse_seqids
done < AssembliesID_ls.txt

Snakemake pipeline

Run the pipeline with the following command, which will use the same list of samples as above.

sbatch assemblies_snk.sh

Note it can be run outside a batch script directly with

snakemake --keep-going --cluster "sbatch -p medium --mem 60000 -N 1 --cpus-per-task 16 -J Assemblies" --jobs 10

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Notes.txt		Notes.txt
README.md		README.md
Snakefile		Snakefile
assemblies_snk.sh		assemblies_snk.sh
dag.svg		dag.svg
exonerate_to_genes.py		exonerate_to_genes.py
file_graph.svg		file_graph.svg
get_best_hits.py		get_best_hits.py
recovery_from_assembly_pipeline.svg		recovery_from_assembly_pipeline.svg
targets_stats.py		targets_stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assemblies preparation

Snakemake pipeline

About

Releases

Packages

Languages

kleempoel/Target_Recovery_From_Assemblies

Folders and files

Latest commit

History

Repository files navigation

Assemblies preparation

Snakemake pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages