Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonise output format to that of Salmon #13

Open
gringer opened this issue Mar 21, 2024 · 7 comments
Open

Harmonise output format to that of Salmon #13

gringer opened this issue Mar 21, 2024 · 7 comments

Comments

@gringer
Copy link

gringer commented Mar 21, 2024

Oarfish produces three files as output:

  • <basename>.meta_info.json - metadata / mapping summary

  • <basename>.quant - tab-separated gene / transcript counts [tname, len, num_reads]

  • <basename>.infreps.pq - bootstrap replicates file

These files are similar to the output of Salmon, but not the same (e.g. it doesn't use quant.sf), so the .quant files will need to be manually converted into a count matrix for processing using DESeq2 (or a similar program).

It would be helpful, given that its the same lab producing these files, that the output of these programs could be harmonised, so that it can be used directly by any program that can process Salmon output.

https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#transcript-abundance-files-and-tximport-tximeta

@rob-p
Copy link
Contributor

rob-p commented Mar 21, 2024

Hi @gringer,

Thanks for the input and suggestion. I'm open to discussion, but also want to allow tools to evolve over time. For example, I think the infreps.pq is a strictly better solution than what salmon provides. For example, it uses a standard, well-supported data format (Parquet) as opposed to a custom binary format. In fact, support for reading parquet inferential reps is already present in tximeta as part of it's support for piscem-infer.

I'd already mentioned this to @mikelove, but I think we should definitely add native support for oarfish to tximeta. One other important thing about changes in piscem-infer and oarfish is that, based on previous user feedback (with respect to salmon), we've moved from having the output all live in a specific directory by necessity to simply having the output be a file stem which the multiple output files share. If the provided stem includes a new directory name, then that will be properly created. However, after several discussions with heavy salmon users, there was broad agreement for preference of the new idiom over the old one.

--Rob

@gringer
Copy link
Author

gringer commented Mar 22, 2024

Native support for oarfish being added to tximeta would be great.

Thanks for explaining this. I understand the issues with different user bases, I just had a faint hope that things might be more malleable with two similar tools being created by the same research lab.

@mikelove
Copy link

Added an issue:

thelovelab/tximeta#81

Rob maybe you can throw up 1-2 quantified samples somewhere?

@mikelove
Copy link

Oh I noticed/remembered there is currently no digest of the reference sequence in these output files.

You can use this new cut of tximport and skipMeta=FALSE in tximeta to build an SE. tximeta just passes type to tximport so this all should work without any changes to tximeta. (changes will be needed in tximeta once we get to reference digests and identification)

@rob-p
Copy link
Contributor

rob-p commented Mar 23, 2024

Thanks @mikelove,

So the issue here is that oarfish uses minimap2 alignments as input, so it may never even see the transcriptome. How do you suggest we handle this. We could have a signature based on transcript names and lengths (present in the BAM/SAM header), or we could add an oarfish command to allow the user to add a signature to a quantification result, but then that introduces a user-dependent step and is error prone.

--Rob

@mikelove
Copy link

We should definitely adopt the proposal ideas (so the former). We can start to implement the GA4GH digest even. Lets chat this week

@mikelove
Copy link

And to be clear, latest GitHub of tximport will work with oarfish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants