From aefdec195a40733e40e205a620e4537a0b3ccc64 Mon Sep 17 00:00:00 2001 From: Jennifer Chang Date: Fri, 12 Jan 2024 11:04:50 -0800 Subject: [PATCH] Simplify README instructions --- phylogenetic/README.md | 91 +++++++++++++----------------------------- 1 file changed, 28 insertions(+), 63 deletions(-) diff --git a/phylogenetic/README.md b/phylogenetic/README.md index 568bb03..5d831e7 100644 --- a/phylogenetic/README.md +++ b/phylogenetic/README.md @@ -3,42 +3,25 @@ This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at [nextstrain.org/zika](https://nextstrain.org/zika). -The build encompasses fetching data, preparing it for analysis, doing quality -control, performing analyses, and saving the results in a format suitable for -visualization (with [auspice][]). This involves running components of -Nextstrain such as [fauna][] and [augur][]. +## Software requirements -All Zika-specific steps and functionality for the Nextstrain pipeline should be -housed in this repository. - -_This build requires Augur v6._ - -[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml) +Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools. ## Usage If you're unfamiliar with Nextstrain builds, you may want to follow our -[quickstart guide][] first and then come back here. +[Running a Pathogen Workflow guide][] first and then come back here. -There are two main ways to run & visualise the output from this build: +The easiest way to run this pathogen build is using the Nextstrain +command-line tool: -The first, and easiest, way to run this pathogen build is using the [Nextstrain -command-line tool][nextstrain-cli]: -``` -nextstrain build . -nextstrain view auspice/ -``` + nextstrain build . -See the [nextstrain-cli README][] for how to install the `nextstrain` command. +Build output goes into the directories `data/`, `results/` and `auspice/`. -The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended). -The build may then be run via: -``` -snakemake -auspice --datasetDir auspice/ -``` +Once you've run the build, you can view the results in auspice: -Build output goes into the directories `data/`, `results/` and `auspice/`. + nextstrain view auspice/ ## Configuration @@ -46,43 +29,25 @@ Configuration takes place entirely with the `Snakefile`. This can be read top-to specifies its file inputs and output and also its parameters. There is little redirection and each rule should be able to be reasoned with on its own. +### Using GenBank data + +This build starts by pulling preprocessed sequence and metadata files from: + +* https://data.nextstrain.org/files/zika/sequences.fasta.zst +* https://data.nextstrain.org/files/zika/metadata.tsv.zst + +The above datasets have been preprocessed and cleaned from GenBank and are updated at regular intervals. + +### Using example data + +Alternatively, you can run the build using the +example data provided in this repository. To run the build by copying the +example sequences into the `data/` directory, use the following: -## Input data - -This build starts by downloading sequences from -https://data.nextstrain.org/files/zika/sequences.fasta.xz -and metadata from -https://data.nextstrain.org/files/zika/metadata.tsv.gz. -These are publicly provisioned data by the Nextstrain team by pulling sequences -from NCBI GenBank via ViPR and performing -[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md). - -Data from GenBank follows Open Data principles, such that we can make input data -and intermediate files available for further analysis. Open Data is data that -can be freely used, re-used and redistributed by anyone - subject only, at most, -to the requirement to attribute and sharealike. - -We gratefully acknowledge the authors, originating and submitting laboratories -of the genetic sequences and metadata for sharing their work in open databases. -Please note that although data generators have generously shared data in an open -fashion, that does not mean there should be free license to publish on this -data. Data generators should be cited where possible and collaborations should -be sought in some circumstances. Please try to avoid scooping someone else's -work. Reach out if uncertain. Authors, paper references (where available) and -links to GenBank entries are provided in the metadata file. - -A faster build process can be run working from example data by copying over -sequences and metadata from `example_data/` to `data/` via: -``` -mkdir -p data/ -cp -v example_data/* data/ -``` + nextstrain build . --configfile profiles/ci/profiles_config.yaml [Nextstrain]: https://nextstrain.org -[fauna]: https://github.com/nextstrain/fauna -[augur]: https://github.com/nextstrain/augur -[auspice]: https://github.com/nextstrain/auspice -[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options -[nextstrain-cli]: https://github.com/nextstrain/cli -[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md -[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart +[augur]: https://docs.nextstrain.org/projects/augur/en/stable/ +[auspice]: https://docs.nextstrain.org/projects/auspice/en/stable/index.html +[Installing Nextstrain guide]: https://docs.nextstrain.org/en/latest/install.html +[Running a Pathogen Workflow guide]: https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html