Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all indexes and conversions should be created by default #178

Open
daler opened this issue Jul 22, 2019 · 1 comment
Open

all indexes and conversions should be created by default #178

daler opened this issue Jul 22, 2019 · 1 comment

Comments

@daler
Copy link
Contributor

daler commented Jul 22, 2019

Currently, we need to manually specify in the config.yaml (or references config) that we want bowtie2 or hisat or star or whatever indexes, and that we want the different conversions for refflat, gffutils db, etc.

One option is to add these all to the included reference configs; another option is to hard-code them into the references workflow so that there is always a rule available for them (even though a particular workflow may not need all of them, in which case they will not be built).

A nice side effect is that when running the references workflow, it will create all the files needed so they will always be on hand.

@daler
Copy link
Contributor Author

daler commented Aug 22, 2019

After working on this a bit in #209, I realized that we should keep it the way it is for now, for the following reasons:

  1. gffutils db often requires custom kwargs to handle all the idiosyncracies of a GTF file. There's not a good "default" way of running this. And there are cases like GENCODE human GTF where the wrong settings cause it take forever to create a db.

  2. genome fastas and transcriptome fastas both fall under the "fasta" field in the references config. It doesn't make sense to build a salmon index for a genome fasta or a star index for a transcriptome fasta.

  3. While we want to retain the ability to pass kwargs, it's not clear that these should be overrides.

Possible solutions to these issues;

  1. Don't do the gffutils conversion. Not sure how useful it is anyway.

  2. Change the config specification to use "genome_fasta" and "transcriptome_fasta" rather than just "fasta". This has a nice side-effect in that it helps tie together gtf/genome/transcriptome in a way that they are not tied together now.

  3. Change the "conversions" key to be "conversions_overrides" or something.

@daler daler mentioned this issue Sep 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant