Merge remote-tracking branch 'refs/remotes/origin/v1.2.0' into v1.2.0

CDCgov · Aug 6, 2024 · 62df1f3 · 62df1f3
2 parents 2b88024 + 4886094
commit 62df1f3
Show file tree

Hide file tree

Showing 1,146 changed files with 168,998 additions and 33,969 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -26,11 +26,11 @@ github_pages_url <- description$GITHUB_PAGES
 
 <p style="font-size: 16px;"><em>Public Database Submission Pipeline</em></p>
 
-**Beta Version**: `r version`. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome! 
+**Beta Version**: v1.2.0. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome! 
 
 **General Disclaimer**: This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm).  GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
 
-# [Documentation](`r github_pages_url`/index.html)
+# [Documentation](https://dthoward96.github.io/seqsender_test_website/)
 
 ## Overview
 
@@ -58,7 +58,7 @@ github_pages_url <- description$GITHUB_PAGES
 
 4. Refer to this page for information regarding requirements for GenBank submissions via FTP only. This page applies only for COVID and Influenza [NCBI GenBank FTP Submissions](https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/#step5) For further questions contact <a href="mailto:[email protected]">[email protected]</a> to discuss requirements for submissions.
 
-5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission. For more information about these two fields, see [BioSample](`r github_pages_url`/articles/biosample_submission.html#metadata) / [SRA](`r github_pages_url`/articles/sra_submission.html#metadata) / [GENBANK](`r github_pages_url`/articles/genbank_submission.html#metadata) metadata requirements.
+5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission.
 
 - **GISAID Submissions**
 
@@ -75,34 +75,6 @@ Here is a quick look of where to store the downloaded **GISAID CLI** package.
 ![](man/figures/gisaid_cli_dir.png)
 
 
-
-## Requirement Files
-
-Before submitters can perform a batch submission using ``r program``, they must make sure the requirement files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already prepared and stored in a submission directory of choice.
-
-To prep for submissions, select one of the databases below to get started:
-*to submit to multiple databases just combine the required metadata for each database into one file.
-
-**NCBI:**
-
-> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
-> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
-> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
-
-**GISAID:**
-
-> <a href="`r github_pages_url`/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a> <br>
-
-## Quick Start
-
-- [How to run seqsender locally](`r github_pages_url`/articles/local_installation.html)
-- [How to run seqsender with Docker](`r github_pages_url`/articles/docker_installation.html)
-- [How to run seqsender with Compose](`r github_pages_url`/articles/compose_installation.html)
-- [How to run seqsender with Singularity](`r github_pages_url`/articles/singularity_installation.html)
-
 ## Code Attributions
 
 Dakota Howard and Reina Chau for majority of the code base with input and testing from [colleagues](`r github_pages_url`/authors.html). 

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
 
 </p>
 
-**Beta Version**: 1.1.0. This pipeline is currently in Beta testing, and
+**Beta Version**: 1.2.0. This pipeline is currently in Beta testing, and
 issues could appear during submission. Please use it at your own risk.
 Feedback and suggestions are welcome\!
 
@@ -23,7 +23,7 @@ CDC and its partners to share information and collaborate on software.
 CDC use of GitHub does not imply an endorsement of any one particular
 service, product, or enterprise.
 
-# [Documentation](https://cdcgov.github.io/seqsender/index.html)
+# [Documentation](https://dthoward96.github.io/seqsender_test_website/)
 
 ## Overview
 
@@ -85,14 +85,7 @@ FTP on the command line. Before attempting to submit a submission using
     used to report back assigned accessions as well as for cross-linking
     objects within submission. The values of **spuid\_namespace** are up
     to the submitter to decide but they must be unique and
-    well-coordinated prior to make a submission. For more information
-    about these two fields, see
-    [BioSample](https://cdcgov.github.io/seqsender/articles/biosample_submission.html#metadata)
-    /
-    [SRA](https://cdcgov.github.io/seqsender/articles/sra_submission.html#metadata)
-    /
-    [GENBANK](https://cdcgov.github.io/seqsender/articles/genbank_submission.html#metadata)
-    metadata requirements.
+    well-coordinated prior to make a submission.
 
 <!-- end list -->
 
@@ -130,48 +123,6 @@ package.
 
 ![](man/figures/gisaid_cli_dir.png)
 
-## Requirement Files
-
-Before submitters can perform a batch submission using `seqsender`, they
-must make sure the requirement files (such as *config.yaml*,
-*metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already
-prepared and stored in a submission directory of choice.
-
-To prep for submissions, select one of the databases below to get
-started: \*to submit to multiple databases just combine the required
-metadata for each database into one file.
-
-**NCBI:**
-
-> <a href="https://cdcgov.github.io/seqsender/articles/biosample_submission.html" target="_blank">BioSample</a>
-> <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/sra_submission.html" target="_blank">SRA</a>
-> <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/genbank_submission.html" target="_blank">Genbank</a>
-> <br>
-
-**GISAID:**
-
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a>
-> <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a>
-> <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a>
-> <br>
-> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a>
-> <br>
-
-## Quick Start
-
-  - [How to run seqsender
-    locally](https://cdcgov.github.io/seqsender/articles/local_installation.html)
-  - [How to run seqsender with
-    Docker](https://cdcgov.github.io/seqsender/articles/docker_installation.html)
-  - [How to run seqsender with
-    Compose](https://cdcgov.github.io/seqsender/articles/compose_installation.html)
-  - [How to run seqsender with
-    Singularity](https://cdcgov.github.io/seqsender/articles/singularity_installation.html)
-
 ## Code Attributions
 
 Dakota Howard and Reina Chau for majority of the code base with input

diff --git a/argument_handler.py b/argument_handler.py
@@ -0,0 +1,140 @@
+#!/usr/bin/env python3
+
+###########################    Description    ##################################
+# Parsers for handling SeqSender input
+################################################################################
+
+import argparse
+from typing import List
+from settings import ORGANISM_CHOICES
+
+def args_parser():
+	"""
+	Argument parser setup and build.
+	"""
+	parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+									description="Genomic tool to simplify/automate the process of submitting organism samples to public repositories. With built-in tools to create/submit/link/log organism samples for the databases: BioSample, SRA, GenBank, and GISAID.")
+	database_parser = argparse.ArgumentParser(add_help=False)
+	organism_parser = argparse.ArgumentParser(add_help=False)
+	validate_parser = argparse.ArgumentParser(add_help=False)
+	submission_name_parser = argparse.ArgumentParser(add_help=False)
+	submission_dir_parser = argparse.ArgumentParser(add_help=False)
+	upload_log_submission_name_parser = argparse.ArgumentParser(add_help=False)
+	config_file_parser = argparse.ArgumentParser(add_help=False)
+	file_parser = argparse.ArgumentParser(add_help=False)
+	test_parser = argparse.ArgumentParser(add_help=False)
+
+	database_parser.add_argument("--biosample", "-b",
+		help="Create/Submit BioSample data.",
+		action="store_const",
+		const="BIOSAMPLE",
+		default="")
+	database_parser.add_argument("--sra", "-s",
+		help="Create/Submit SRA data.",
+		action="store_const",
+		const="SRA",
+		default="")
+	database_parser.add_argument("--genbank", "-n",
+		help="Create/Submit GenBank data. (requires --fasta_file)",
+		action="store_const",
+		const="GENBANK",
+		default="")
+	database_parser.add_argument("--gisaid", "-g",
+		help="Create/Submit GISAID data. (requires --fasta_file)",
+		action="store_const",
+		const="GISAID",
+		default="")
+	organism_parser.add_argument("--organism",
+		help="Type of organism data. Listed organism options have unique submissions options/processes, if your specific organism is not listed, use 'OTHER' for options available to all organisms.",
+		choices=ORGANISM_CHOICES,
+		default="",
+		required=True)
+	validate_parser.add_argument("--skip_validation",
+		help="Skip initial validation for metadata file. Validation will still occur for the 'config_file' and for any subsequent submissions made via 'submission_status'. Warning, this can cause unexpected errors using SeqSender if required columns are missing.",
+		required=False,
+		action="store_const",
+		default=False,
+		const=True)
+	submission_name_parser.add_argument("--submission_name",
+		help="Unique name for the submission of your data. Reusing the same name can cause issues during the submission process. A folder will be created at: 'submission_dir/submission_name'.",
+		required=True)
+	upload_log_submission_name_parser.add_argument("--submission_name",
+		help="Unique name for the submission of your data. This is an optional field if you want Seqsender to only update the specified submission in the 'submission_log.csv'.",
+		required=False)
+	submission_dir_parser.add_argument("--submission_dir",
+		help="Output directory where all files for your submission will be stored. A folder will be created at '<submission_dir>/<submission_name>'; this is the location where: all of the submission files will be created, SeqSender will stage each step of the submission process automatically, and where SeqSender will generate all the output from your submission.",
+		required=True)
+	config_file_parser.add_argument("--config_file",
+		help="Config file to be used in the creation/submission of your samples. SeqSender will store this file location in your 'submission_log.csv' where it will use it to manage your submission, be careful when modifying and ensure SeqSender maintains access to this file. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<config_file>'.",
+		required=True)
+	file_parser.add_argument("--metadata_file",
+		help="Metadata file to be used in the creation/submission of your samples. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<metadata_file>'.",
+		required=True)
+	file_parser.add_argument("--fasta_file",
+		help="Fasta file used to generate submission files; fasta header should match the column 'sequence_name' stored in your metadata. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<fasta_file>'.",
+		required=True)
+	file_parser.add_argument("--table2asn",
+		help="Perform a table2asn submission instead of GenBank FTP submission for organism choices 'FLU' or 'COV'.",
+		required=False,
+		action="store_const",
+		default=False,
+		const=True)
+	file_parser.add_argument("--gff_file",
+		help="Annotation file only available for table2asn submissions. (requires '--table2asn' for organism choices 'FLU', or 'COV').",
+		default=None)
+	test_parser.add_argument("--test",
+		help="Perform a test submission.",
+		action="store_const",
+		default=False,
+		const=True)
+
+	# Create the submodule commands
+	subparser_modules = parser.add_subparsers(dest="command")
+
+	# prep command
+	prep_module = subparser_modules.add_parser(
+		"prep",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Generate all files required to submit to databases selected.",
+		parents=[database_parser, organism_parser, submission_name_parser, submission_dir_parser, config_file_parser, file_parser, validate_parser]
+	)
+
+	# submit command
+	submit_module = subparser_modules.add_parser(
+		"submit",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Generate all files required and begin the submission process to databases selected.",
+		parents=[database_parser, organism_parser, submission_name_parser, submission_dir_parser, config_file_parser, file_parser, test_parser, validate_parser]
+	)
+
+	# check_submission_status command
+	update_module = subparser_modules.add_parser(
+		"submission_status",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Checks the submission status for (all/specified <submission_name>) submission('s) which will for each database: update the status of the submission('s), download output file('s), submit to subsequent specified databases if linking information requires output of previous database('s).",
+		parents=[submission_dir_parser, upload_log_submission_name_parser]
+	)
+
+	# Generate test data command
+	test_output_module = subparser_modules.add_parser(
+		"test_data",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Returns a set of test data examples (config_file, metadata_file, fasta_file, etc.) to the specified 'submission_dir' based on organism and database selections.",
+		parents=[database_parser, organism_parser, submission_dir_parser]
+	)
+
+	# biosample xml download command
+	biosample_xml_module = subparser_modules.add_parser(
+		"update_biosample",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Downloads the BioSample Package XML from NCBI and updates SeqSender's metadata schema options for the BioSample database."
+	)
+
+	# version command
+	version_module = subparser_modules.add_parser(
+		"version",
+		formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+		description="Print version."
+	)
+
+	return parser