Skip to content

Commit

Permalink
Update data_managers_tools.yml -> data_managers.yml...
Browse files Browse the repository at this point in the history
  • Loading branch information
jmchilton committed Jun 30, 2023
1 parent bfeec34 commit c646209
Show file tree
Hide file tree
Showing 4 changed files with 78 additions and 170 deletions.
92 changes: 16 additions & 76 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,92 +26,32 @@ The resulting genome files and tool indices will be located in the directory spe

The three important files are:

* `data_managers_tools.yml`
* `data_managers_genomes.yml`
* `data_managers.yml`
* `genomes.yml`

### data_managers_tools.yml
### data_managers.yml

This file contains the list of data managers that are to be installed into the Galaxy docker instance. It is in the format of an Ephemeris tool specification .yml file.
This file contains the list of data managers that are to be installed into the target
Galaxy building IDC data.

```yaml
tools:
- name: NAME_OF_THE_DATA_MANAGER
owner: OWNER_OF_THE_TOOL_IN_THE_TOOLSHED
tool_panel_section_label: None
toolshed_url: toolshed.g2.bx.psu.edu
tags:
- tag #Tag can be either "genome" or "fetch_source".
NAME_OF_THE_DATA_MANAGER:
tool_id: TOOL_ID_IN_TARGET_REPO_OF_DATA_MANAGER
tags:
- tag #Tag can be either "genome" or "fetch_source".
```
Other data managers are added as elements in the `tools` yml array. The first tool listed should always be the `fetch_source` data manager. In most cases this will be the `data_manager_fetch_genome_dbkeys_all_fasta` data manager that sources and downloads most genomes and populates the `all_fasta` and `__dbkeys__` data tables for later use by other data managers.

Example:

```yaml
tools:
- name: data_manager_fetch_genome_dbkeys_all_fasta
owner: devteam
tool_panel_section_label: None
tool_shed_url: toolshed.g2.bx.psu.edu
tags:
- fetch_source
- name: data_manager_bowtie2_index_builder
owner: devteam
tool_panel_section_label: None
tool_shed_url: toolshed.g2.bx.psu.edu
tags:
- genome
- name: data_manager_bwa_mem_index_builder
owner: devteam
tool_panel_section_label: None
tool_shed_url: toolshed.g2.bx.psu.edu
tags:
- genome
```

### data_managers_genomes.yml

This file contains the details of the data managers that will be run on the fetched genomes to create the various tool indices.

Each data manager is specified by it's **Galaxy id** (not its name) and the mappings of input fields to the appropriate field in the `all_fasta` or `__dbkeys__` data tables. Only data managers tagged with the `genome` tag in the `data_managers_tools.yml` file need to be included here. It does not contain details about the `data_manager_fetch_genome_dbkeys_all_fasta` data manager as this is handled separately.
Ephemeris can be used to generate a shed-tool install file to bootstrap the required tools
and repositories into a target Galaxy for IDC installs.

Format:

```yaml
data_managers:
- id: Galaxy_tool_id_of_the_data_manager # found in Galaxy's shed_data_manager_conf.xml file.
params: # a yml array of mappings of data manager's input fields to listings in the genomes.yml file using item.attribute where attribute is the appropriate field in the genomes file.
- 'all_fasta_source': '{{ item.id }}'
- 'sequence_name': '{{ item.name }}'
- 'sequence_id': '{{ item.id }}'
data_table_reload: # a list of data tables to reload once this data manager has completed.
- name_of_data_table_to_reload
```

Example:

```yaml
data_managers:
- id: toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bowtie2_index_builder/bowtie2_index_builder_data_manager/2.3.4.3
params:
- 'all_fasta_source': '{{ item.id }}'
- 'sequence_name': '{{ item.name }}'
- 'sequence_id': '{{ item.id }}'
items: "{{ genomes }}"
data_table_reload:
# Bowtie creates indices for Bowtie and TopHat
- bowtie2_indexes
- tophat2_indexes
- id: toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bwa_mem_index_builder/bwa_mem_index_builder_data_manager/0.0.3
params:
- 'all_fasta_source': '{{ item.id }}'
- 'sequence_name': '{{ item.name }}'
- 'sequence_id': '{{ item.id}}'
items: "{{ genomes }}"
data_table_reload:
- bwa_mem_indexes
```bash
pip install ephemeris
_idc-data-managers-to-tools
# defaults to:
# _idc-data-managers-to-tools --data-managers-conf=genomes.yml --shed-install-output-conf=tools.yml
shed-tools install -t tools.yml
```

### genomes.yml
Expand Down
60 changes: 60 additions & 0 deletions data_managers.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
data_manager_fetch_genome_dbkeys_all_fasta:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/data_manager_fetch_genome_all_fasta_dbkey/0.0.3'
tags:
- fetch_source
data_manager_bowtie2_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bowtie2_index_builder/bowtie2_index_builder_data_manager/2.3.4.3'
tags:
- genome
data_manager_bwa_mem_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bwa_mem_index_builder/bwa_mem_index_builder_data_manager/0.0.3'
tags:
- genome
data_manager_hisat_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_hisat_index_builder/hisat_index_builder_data_manager/1.0.0'
tags:
- genome
data_manager_twobit_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_twobit_builder/twobit_builder_data_manager/0.0.2'
tags:
- genome
data_manager_picard_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_picard_index_builder/picard_index_builder_data_manager/2.7.1'
tags:
- genome
data_manager_sam_fasta_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_sam_fasta_index_builder/sam_fasta_index_builder/0.0.2'
tags:
- genome
data_manager_hisat2_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_hisat2_index_builder/hisat2_index_builder_data_manager/2.0.5'
tags:
- genome
data_manager_star_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_star_index_builder/rna_star_index_builder_data_manager/0.0.5'
tags:
- genome
data_manager_bowtie_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_bowtie_index_builder/bowtie_color_space_index_builder_data_manager/0.0.2'
tags:
- genome
data_manager_kallisto_index_builder:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_kallisto_index_builder/kallisto_index_builder_data_manager/0.43.1'
tags:
- genome
data_manager_snpeff:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_snpeff/data_manager_snpeff_databases/4.3r'
tags:
- snpeff
data_manager_plant_tribes_scaffolds_downloader:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_plant_tribes_scaffolds_downloader/data_manager_plant_tribes_scaffolds_download/1.1.0'
tags:
- plant_source
data_manager_fetch_ncbi_taxonomy:
tool_id: 'toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_ncbi_taxonomy/ncbi_taxonomy_fetcher/1.0.0'
tags:
- tax_source
data_manager_gemini_database_downloader:
tool_id: 'toolshed.g2.bx.psu.edu/repos/iuc/data_manager_gemini_database_downloader/data_manager_gemini_download/0.20.1'
tags:
- gemini
91 changes: 0 additions & 91 deletions data_managers_tools.yml

This file was deleted.

5 changes: 2 additions & 3 deletions run_builder.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,8 @@ chmod 0777 ${DATA_MANAGER_DATA_PATH}

echo 'Installing Data Managers'
# Install the data managers
shed-tools install -t data_managers_tools.yml -g ${GALAXY_URL} -u $GALAXY_DEFAULT_ADMIN_USER -p $GALAXY_DEFAULT_ADMIN_PASSWORD
#Let others read the shed_data_manager_conf.xml file
sudo chmod ugo+r ${EXPORT_DIR}/galaxy-central/config/shed_data_manager_conf.xml
_idc-data-managers-to-tools
shed-tools install -t tools.yml -g ${GALAXY_URL} -u $GALAXY_DEFAULT_ADMIN_USER -p $GALAXY_DEFAULT_ADMIN_PASSWORD

echo 'Fetching new genomes'
#Run make_fetch.py to build the fetch manager config file for ephemeris
Expand Down

0 comments on commit c646209

Please sign in to comment.