Skip to content

Files

Latest commit

d5dd99c · Feb 24, 2025

History

History

lib

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Feb 24, 2025
Oct 3, 2023
Oct 3, 2023
Feb 9, 2022
Jan 12, 2022
Feb 21, 2024
Mar 24, 2023
Feb 12, 2023
Jun 8, 2022
Feb 7, 2023
Oct 3, 2023
Oct 3, 2023
Nov 21, 2022
Sep 13, 2022
Feb 7, 2023
Jul 27, 2022
Jun 15, 2023
Jun 8, 2022
Jun 20, 2024
Jul 18, 2023
Nov 21, 2022
Jul 18, 2023
Jan 11, 2023
Feb 7, 2023
Jun 8, 2022
Dec 13, 2021
Aug 23, 2022
Jun 9, 2022
Jun 7, 2023
Jun 7, 2023
Feb 9, 2023
Jun 7, 2023
Oct 16, 2022
Aug 4, 2023
Oct 3, 2023
Jan 3, 2022
Mar 31, 2022
Feb 7, 2023
Jan 11, 2023
Oct 3, 2023
Jul 31, 2023
Feb 2, 2022
Aug 3, 2022
Jan 11, 2023

Guide to using scripts in /lib:

biosample_complete.sh

Generates a TSV file (hopefully) adhering to the biosampleMeta schema as defined in https://github.com/FDA-ARGOS/data.argosdb/blob/main/schema/v1.4/core/biosampleMeta_HIVE.json

The TSV returned contains one row per SRA id associated with the biosample.

Required parameters:

  • -f: Path to text file of bioSample IDs (one ID per line)
  • -n: Path to NGS QC file (must be TSV format)

Optional parameters:

  • -b: BCO ID
  • -s: schema version
  • -d: debug (if -d is not set, the output directory will contain intermediate xml and tsv files)

Example usage:

./biosample_complete.sh -f biosample_ids.txt -n ngsQC_HIVE.tsv -b ARGOS_000028 -s v1.12

lib

For scripts and sutch

Validating a data file against a schema:

Assume you wanted to validate a flie of the typeSRA_ngsQC(this same process should work for any of the types we have defined).

  • The data file is /data_files/test_SRA_ngsQC.tsv
  • The schema for a SRA_ngsQC data file is /schema/v0.5/non-core/SRA_ngsQC.json

For illitstration purposes cell T6 in our example data file has been modified. The schema says that the value has to be less than 1, as gc_ content is a percentage. The example data sheet has a value of 10.63682374 in that cell, and the following error shoudl be thrown:

Line 5 failed. '10.63682374' does not match '^[+-]?([0]+\\.?[0-9]*|\\.[0-9]+)$

From the project root run:

> python lib/dictionary_utils.py validate -i data_files/test_SRA_ngsQC.tsv -s schema/v0.5/non-core/SRA_ngsQC.json

Validating a data file against a schema with remote files:

Both the schema [-s] and input file [-i] values can take a URL, assuming they are formatted correctly and resolvable.

For Example:

>  python lib/dictionary_utils.py validate -i https://raw.githubusercontent.com/FDA-ARGOS/data.argosdb/v0.5/data_files/test_SRA_ngsQC.tsv -s https://raw.githubusercontent.com/FDA-ARGOS/data.argosdb/v0.5/schema/v0.5/non-core/SRA_ngsQC.json

should give you the same results.