Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RefGenie - add assets currently used from AWS-iGenomes #1086

Open
ewels opened this issue May 18, 2021 · 4 comments
Open

RefGenie - add assets currently used from AWS-iGenomes #1086

ewels opened this issue May 18, 2021 · 4 comments
Assignees
Labels
template nf-core pipeline/component template

Comments

@ewels
Copy link
Member

ewels commented May 18, 2021

Split from #592, related directly to #1084


We want to transition away from using AWS-iGenomes to using the central RefGenie server(s) to host reference genome assets for the nf-core pipelines. To do this, we need to make sure that everything we currently have is available on a AWS refgenie server somewhere. Once all assets are mirrored we can do a clean swap in the config.

Going from igenomes.config, here is a check-list:

Summary of genomes
  • GRCh37
  • GRCh38
  • GRCm38
  • TAIR10
  • EB2
  • UMD3.1
  • WBcel235
  • CanFam3.1
  • GRCz10
  • BDGP6
  • EquCab2
  • EB1
  • Galgal4
  • Gm01
  • Mmul_1
  • IRGSP-1.0
  • CHIMP2.1.4
  • Rnor_6.0
  • R64-1-1
  • EF2
  • Sbi1
  • Sscrofa10.2
  • AGPv3
  • hg38
  • hg19
  • mm10
  • bosTau8
  • ce10
  • canFam3
  • danRer10
  • dm6
  • equCab2
  • galGal4
  • panTro4
  • rn6
  • sacCer3
  • susScr3
Detailed version with asset types
  • GRCh37
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
    • blacklist
  • GRCh38
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
    • blacklist
  • GRCm38
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
    • blacklist
  • TAIR10
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • EB2
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
  • UMD3.1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • WBcel235
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
  • CanFam3.1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • GRCz10
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • BDGP6
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
  • EquCab2
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • EB1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
  • Galgal4
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • Gm01
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
  • Mmul_1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • IRGSP-1.0
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • CHIMP2.1.4
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • Rnor_6.0
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • R64-1-1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
  • EF2
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
  • Sbi1
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
  • Sscrofa10.2
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • AGPv3
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • hg38
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
    • blacklist
  • hg19
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
    • blacklist
  • mm10
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
    • blacklist
  • bosTau8
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • ce10
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
    • macs_gsize
  • canFam3
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • danRer10
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
  • dm6
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
    • macs_gsize
  • equCab2
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • galGal4
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • panTro4
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name
  • rn6
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • mito_name
  • sacCer3
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • readme
    • mito_name
    • macs_gsize
  • susScr3
    • fasta
    • bwa
    • bowtie2
    • star
    • bismark
    • gtf
    • bed12
    • readme
    • mito_name

From @nsheff:

For updating: We drive the server instances off a git repository, here: https://github.com/refgenie/refgenomes.databio.org

RIght now, it's semi-automated. Eventually we want it to be that you just update the repo with a PR, and when merged it will deploy automatically. For now, though, we have to build the things manually, but it's all scripted from that repository. The asses are all just annotated with a PEP. So, to add a genome you'd add it to this CSV file: https://github.com/refgenie/refgenomes.databio.org/blob/master/asset_pep/genome_descriptions.csv

Then you'd add whatever inputs are required to build the assets to this file: https://github.com/refgenie/refgenomes.databio.org/blob/master/asset_pep/recipe_inputs.csv

See also:

@KevinMenden
Copy link
Contributor

KevinMenden commented May 28, 2021

Checking which assets currently available in iGenomes have corresponding recipes available in refgenie.
https://github.com/refgenie/refgenie/blob/master/refgenie/asset_build_packages.py

  • fasta
  • bwa
  • bowtie2
  • star
  • bismark
  • gtf
  • bed12
  • readme
  • mito_name
  • macs_gsize
  • blacklist

readme, mito_name and macs_gsize are all specific assets that would have to be added manually, as long as that is allowed by refgenie. For bed12 we should be able to write a build recipe and add it to the asset_build_packages.py

@stolarczyk
Copy link

@KevinMenden, I can help put together the refgenie asset recipes for the missing asset types. Where can I find commands used to create these?

@maxulysse
Copy link
Member

@ewels
Copy link
Member Author

ewels commented Jun 17, 2022

TODO:

Still to add:

These can be added as attributes to eg. the Fast file rather than dedicated uploads:

Readme can probably be skipped, as RefGenie hopefully already has enough provenance for assets.

@ewels ewels assigned mirpedrol and ErikDanielsson and unassigned KevinMenden Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
template nf-core pipeline/component template
Projects
None yet
Development

No branches or pull requests

6 participants