An Rscript to merge GISAID and Nextclade data for research purposes
Obtaining Raw Data for DB creation:
- Using the GISAID Database, for selected samples download desired Augur input and sequences in fasta format.
- Augur data downloads as a .tsv file, check to see that the date columns are formatted as dates
- Run the the sequences in the fasta file at nextstrain - https://clades.nextstrain.org/
- download the nextstrain output as a .tsv file.
- in the nextstrain output, you will need to split the text of the first column on the character "|" this will separate the GISAID ID and virus name