- Genbank search URL
- This is search fields of
mumps[title] AND viruses[filter] AND ("5000"[SLEN] : "20000"[SLEN])
- Send to : Complete Record : File : Accession List
- This downloads the file
sequence.seq
- Open this file and remove the
.1
,.2
, etc... from the accession numbers
python3 vdb/mumps_upload.py -db vdb -v mumps --ftype accession --source genbank --locus genome --fname sequence.seq
FASTA header field ordering:
- random numbering - this will later be filled in by GenBank accession
- strain name
- collection date
- host species
- country
- state/region
- genotype
This is not necessary when uploading accessions as we do here.
This is needed to populate certain attributes such as author & paper title.
python3 vdb/mumps_update.py -db vdb -v mumps --update_citations
python3 vdb/mumps_download.py -db vdb -v mumps --fstem mumps --resolve_method choose_genbank
Preprocess to fix metadata and header ordering
python3 vdb/mumps_preprocess_fasta.py --fasta data/muv-nextstrain-20170718.pruned.fasta > data/mumps_broad.fasta
Upload to fauna
python3 vdb/mumps_upload.py -db vdb -v mumps --source broad --locus genome --fname mumps_broad.fasta --authors "Wohl et al" --title "Unpublished"
If you have a FASTA file and CSV metadata, this script will help (with minor modifications as needed)
python3 scripts/mumps.csv-and-fasta-to-vipr-fasta.py data/input.mumps.raw.fasta data/input.mumps.csv data/input.mumps.vipr.fasta
Upload to fauna
python3 vdb/mumps_upload.py -db vdb -v mumps --source bccdc --locus genome --fname mumps.bc.fasta --authors "Gardy et al" --title "Unpublished"
Upload to fauna
python3 vdb/mumps_upload.py -db vdb -v mumps --source fh --locus genome --fname MuVs-WA0268502_buccal-Washington.USA-16.fasta --authors "Moncla et al" --title "Unpublished"