Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/origin/v1.2.0' into v1.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
dthoward96 committed Aug 6, 2024
2 parents 2b88024 + 4886094 commit 62df1f3
Show file tree
Hide file tree
Showing 1,146 changed files with 168,998 additions and 33,969 deletions.
34 changes: 3 additions & 31 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ github_pages_url <- description$GITHUB_PAGES

<p style="font-size: 16px;"><em>Public Database Submission Pipeline</em></p>

**Beta Version**: `r version`. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!
**Beta Version**: v1.2.0. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!

**General Disclaimer**: This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.

# [Documentation](`r github_pages_url`/index.html)
# [Documentation](https://dthoward96.github.io/seqsender_test_website/)

## Overview

Expand Down Expand Up @@ -58,7 +58,7 @@ github_pages_url <- description$GITHUB_PAGES

4. Refer to this page for information regarding requirements for GenBank submissions via FTP only. This page applies only for COVID and Influenza [NCBI GenBank FTP Submissions](https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/#step5) For further questions contact <a href="mailto:[email protected]">[email protected]</a> to discuss requirements for submissions.

5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission. For more information about these two fields, see [BioSample](`r github_pages_url`/articles/biosample_submission.html#metadata) / [SRA](`r github_pages_url`/articles/sra_submission.html#metadata) / [GENBANK](`r github_pages_url`/articles/genbank_submission.html#metadata) metadata requirements.
5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission.

- **GISAID Submissions**

Expand All @@ -75,34 +75,6 @@ Here is a quick look of where to store the downloaded **GISAID CLI** package.
![](man/figures/gisaid_cli_dir.png)



## Requirement Files

Before submitters can perform a batch submission using ``r program``, they must make sure the requirement files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already prepared and stored in a submission directory of choice.

To prep for submissions, select one of the databases below to get started:
*to submit to multiple databases just combine the required metadata for each database into one file.

**NCBI:**

> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
**GISAID:**

> <a href="`r github_pages_url`/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a> <br>
> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a> <br>
> <a href="`r github_pages_url`/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a> <br>
> <a href="`r github_pages_url`/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a> <br>
## Quick Start

- [How to run seqsender locally](`r github_pages_url`/articles/local_installation.html)
- [How to run seqsender with Docker](`r github_pages_url`/articles/docker_installation.html)
- [How to run seqsender with Compose](`r github_pages_url`/articles/compose_installation.html)
- [How to run seqsender with Singularity](`r github_pages_url`/articles/singularity_installation.html)

## Code Attributions

Dakota Howard and Reina Chau for majority of the code base with input and testing from [colleagues](`r github_pages_url`/authors.html).
Expand Down
55 changes: 3 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

</p>

**Beta Version**: 1.1.0. This pipeline is currently in Beta testing, and
**Beta Version**: 1.2.0. This pipeline is currently in Beta testing, and
issues could appear during submission. Please use it at your own risk.
Feedback and suggestions are welcome\!

Expand All @@ -23,7 +23,7 @@ CDC and its partners to share information and collaborate on software.
CDC use of GitHub does not imply an endorsement of any one particular
service, product, or enterprise.

# [Documentation](https://cdcgov.github.io/seqsender/index.html)
# [Documentation](https://dthoward96.github.io/seqsender_test_website/)

## Overview

Expand Down Expand Up @@ -85,14 +85,7 @@ FTP on the command line. Before attempting to submit a submission using
used to report back assigned accessions as well as for cross-linking
objects within submission. The values of **spuid\_namespace** are up
to the submitter to decide but they must be unique and
well-coordinated prior to make a submission. For more information
about these two fields, see
[BioSample](https://cdcgov.github.io/seqsender/articles/biosample_submission.html#metadata)
/
[SRA](https://cdcgov.github.io/seqsender/articles/sra_submission.html#metadata)
/
[GENBANK](https://cdcgov.github.io/seqsender/articles/genbank_submission.html#metadata)
metadata requirements.
well-coordinated prior to make a submission.

<!-- end list -->

Expand Down Expand Up @@ -130,48 +123,6 @@ package.

![](man/figures/gisaid_cli_dir.png)

## Requirement Files

Before submitters can perform a batch submission using `seqsender`, they
must make sure the requirement files (such as *config.yaml*,
*metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already
prepared and stored in a submission directory of choice.

To prep for submissions, select one of the databases below to get
started: \*to submit to multiple databases just combine the required
metadata for each database into one file.

**NCBI:**

> <a href="https://cdcgov.github.io/seqsender/articles/biosample_submission.html" target="_blank">BioSample</a>
> <br>
> <a href="https://cdcgov.github.io/seqsender/articles/sra_submission.html" target="_blank">SRA</a>
> <br>
> <a href="https://cdcgov.github.io/seqsender/articles/genbank_submission.html" target="_blank">Genbank</a>
> <br>
**GISAID:**

> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_flu_submission.html" target="_blank">EpiFlu</a>
> <br>
> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_cov_submission.html" target="_blank">EpiCoV</a>
> <br>
> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_pox_submission.html" target="_blank">EpiPox</a>
> <br>
> <a href="https://cdcgov.github.io/seqsender/articles/gisaid_arbo_submission.html" target="_blank">EpiArbo</a>
> <br>
## Quick Start

- [How to run seqsender
locally](https://cdcgov.github.io/seqsender/articles/local_installation.html)
- [How to run seqsender with
Docker](https://cdcgov.github.io/seqsender/articles/docker_installation.html)
- [How to run seqsender with
Compose](https://cdcgov.github.io/seqsender/articles/compose_installation.html)
- [How to run seqsender with
Singularity](https://cdcgov.github.io/seqsender/articles/singularity_installation.html)

## Code Attributions

Dakota Howard and Reina Chau for majority of the code base with input
Expand Down
140 changes: 140 additions & 0 deletions argument_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
#!/usr/bin/env python3

########################### Description ##################################
# Parsers for handling SeqSender input
################################################################################

import argparse
from typing import List
from settings import ORGANISM_CHOICES

def args_parser():
"""
Argument parser setup and build.
"""
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Genomic tool to simplify/automate the process of submitting organism samples to public repositories. With built-in tools to create/submit/link/log organism samples for the databases: BioSample, SRA, GenBank, and GISAID.")
database_parser = argparse.ArgumentParser(add_help=False)
organism_parser = argparse.ArgumentParser(add_help=False)
validate_parser = argparse.ArgumentParser(add_help=False)
submission_name_parser = argparse.ArgumentParser(add_help=False)
submission_dir_parser = argparse.ArgumentParser(add_help=False)
upload_log_submission_name_parser = argparse.ArgumentParser(add_help=False)
config_file_parser = argparse.ArgumentParser(add_help=False)
file_parser = argparse.ArgumentParser(add_help=False)
test_parser = argparse.ArgumentParser(add_help=False)

database_parser.add_argument("--biosample", "-b",
help="Create/Submit BioSample data.",
action="store_const",
const="BIOSAMPLE",
default="")
database_parser.add_argument("--sra", "-s",
help="Create/Submit SRA data.",
action="store_const",
const="SRA",
default="")
database_parser.add_argument("--genbank", "-n",
help="Create/Submit GenBank data. (requires --fasta_file)",
action="store_const",
const="GENBANK",
default="")
database_parser.add_argument("--gisaid", "-g",
help="Create/Submit GISAID data. (requires --fasta_file)",
action="store_const",
const="GISAID",
default="")
organism_parser.add_argument("--organism",
help="Type of organism data. Listed organism options have unique submissions options/processes, if your specific organism is not listed, use 'OTHER' for options available to all organisms.",
choices=ORGANISM_CHOICES,
default="",
required=True)
validate_parser.add_argument("--skip_validation",
help="Skip initial validation for metadata file. Validation will still occur for the 'config_file' and for any subsequent submissions made via 'submission_status'. Warning, this can cause unexpected errors using SeqSender if required columns are missing.",
required=False,
action="store_const",
default=False,
const=True)
submission_name_parser.add_argument("--submission_name",
help="Unique name for the submission of your data. Reusing the same name can cause issues during the submission process. A folder will be created at: 'submission_dir/submission_name'.",
required=True)
upload_log_submission_name_parser.add_argument("--submission_name",
help="Unique name for the submission of your data. This is an optional field if you want Seqsender to only update the specified submission in the 'submission_log.csv'.",
required=False)
submission_dir_parser.add_argument("--submission_dir",
help="Output directory where all files for your submission will be stored. A folder will be created at '<submission_dir>/<submission_name>'; this is the location where: all of the submission files will be created, SeqSender will stage each step of the submission process automatically, and where SeqSender will generate all the output from your submission.",
required=True)
config_file_parser.add_argument("--config_file",
help="Config file to be used in the creation/submission of your samples. SeqSender will store this file location in your 'submission_log.csv' where it will use it to manage your submission, be careful when modifying and ensure SeqSender maintains access to this file. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<config_file>'.",
required=True)
file_parser.add_argument("--metadata_file",
help="Metadata file to be used in the creation/submission of your samples. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<metadata_file>'.",
required=True)
file_parser.add_argument("--fasta_file",
help="Fasta file used to generate submission files; fasta header should match the column 'sequence_name' stored in your metadata. Input either full file path or if just file name it must be stored at '<submission_dir>/<submission_name>/<fasta_file>'.",
required=True)
file_parser.add_argument("--table2asn",
help="Perform a table2asn submission instead of GenBank FTP submission for organism choices 'FLU' or 'COV'.",
required=False,
action="store_const",
default=False,
const=True)
file_parser.add_argument("--gff_file",
help="Annotation file only available for table2asn submissions. (requires '--table2asn' for organism choices 'FLU', or 'COV').",
default=None)
test_parser.add_argument("--test",
help="Perform a test submission.",
action="store_const",
default=False,
const=True)

# Create the submodule commands
subparser_modules = parser.add_subparsers(dest="command")

# prep command
prep_module = subparser_modules.add_parser(
"prep",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Generate all files required to submit to databases selected.",
parents=[database_parser, organism_parser, submission_name_parser, submission_dir_parser, config_file_parser, file_parser, validate_parser]
)

# submit command
submit_module = subparser_modules.add_parser(
"submit",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Generate all files required and begin the submission process to databases selected.",
parents=[database_parser, organism_parser, submission_name_parser, submission_dir_parser, config_file_parser, file_parser, test_parser, validate_parser]
)

# check_submission_status command
update_module = subparser_modules.add_parser(
"submission_status",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Checks the submission status for (all/specified <submission_name>) submission('s) which will for each database: update the status of the submission('s), download output file('s), submit to subsequent specified databases if linking information requires output of previous database('s).",
parents=[submission_dir_parser, upload_log_submission_name_parser]
)

# Generate test data command
test_output_module = subparser_modules.add_parser(
"test_data",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Returns a set of test data examples (config_file, metadata_file, fasta_file, etc.) to the specified 'submission_dir' based on organism and database selections.",
parents=[database_parser, organism_parser, submission_dir_parser]
)

# biosample xml download command
biosample_xml_module = subparser_modules.add_parser(
"update_biosample",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Downloads the BioSample Package XML from NCBI and updates SeqSender's metadata schema options for the BioSample database."
)

# version command
version_module = subparser_modules.add_parser(
"version",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description="Print version."
)

return parser
Loading

0 comments on commit 62df1f3

Please sign in to comment.