Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New AnnotationHubDispatchClassList? #6

Open
lgatto opened this issue Dec 11, 2022 · 6 comments
Open

New AnnotationHubDispatchClassList? #6

lgatto opened this issue Dec 11, 2022 · 6 comments

Comments

@lgatto
Copy link
Member

lgatto commented Dec 11, 2022

When creating an ExperimentHub package, it is possible to define dispatch classes, so that some file types can be loaded automatically and returned as predefined objects. See AnnotationHub::DispatchClassList():

DispatchClass Reader
FaFile Rsamtools::FaFile(); requires rtracklayer
BamFile Rsamtools::BamFile(); requires rtracklayer
Rds readRDS()
RDS readRDS()
Rda get(load())
data.frame get(load())
GRanges get(load()); requires GenomicRanges
VCF get(load()); requires VariantAnnotation
ChainFile rtracklayer::import.chain(); requires rtracklayer and GenomeInfoDb; before using import.chain internally uses gzfile and writeBin to extract data from file; files saved as chain.gz
TwoBitFile rtracklayer::TwoBitFile(); requires rtracklayer
GFFFile rtracklayer::import(); require rtracklayer and GenomeInfoDB; after import converts to GRanges object
GFF3File rtracklayer::import(); require rtracklayer
BigWig rtracklayer::BigWigFile(); require rtracklayer
dbSNPVCFFile VariantAnnotation::VcfFile(); require VariantAnnotation; files saved as vcf.gz and vcf.gz.tbi
SQLiteFile AnnotationDbi::loadDb(); files saved as sqlite
GRASP dbFileConnect()
Zip unzip(); returns file path to files
ChEA unzip(); returns data.frame from reading chea-background.csv
BioPax get(load()); require rBiopaxParser
Pazar read.delim(); require GenomicRanges; reads specific columns from file and coverts to GRanges object
CSVtoGranges read.csv(); require GenomicRanges; coverts data.frame to GRanges object
ExpressionSet get(load()); require Biobase
GDS gdsfmt::openfn.gds(); require gdsfmt
H5File require rhdf5; resource downloaded but not loaded; returns file path
FilePath resource downloaded but not loaded; returns file path
BEDFile rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
UCSCBroadPeak rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
UCSCNarrowPeak rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
UCSCBEDRnaElements rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
UCSCGappedPeak rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
EpiMetadata read.delim()
EpiExpressionText read.table(); converts to SummarizedExperiment object
EpichmmModels rtracklayer::import(); calls additional helper AnnotationHub:::.mapAbbr2FullName and then converts to GRange object; file assumed to be bed file format
EpigenomeRoadmapFile rtracklayer::import(); converts to GRange object; file assumed to be bed file format
EpigenomeRoadmapNarrowAllPeaks rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
EpigenomeRoadmapNarrowFDR rtracklayer::import(rtracklayer::BEDFile()); require rtracklayer; converts to GRanges object
EnsDb ensembldb::EnsDb(); require ensembldb
mzRpwiz mzR::openMSfile(); require mzR
mzRident mzR::openIDfile(); require mzR
MSnSet get(load()); require MSnbase
AAStringSet Biostrings::readAAStringSet(); require Biostrings

For MsDataHub, we want "FilePath", as we want to get the file path and then load the data ourselves. We could also directly get the desired object, for example a Spectra object created by Spectra() if the file is an mzML.

Should we ask to add Spectra (and possibly others such as PSM for mzid files) to the default dispatch classes?

Ping @jorainer

@jorainer
Copy link
Member

Or should we go directly for MsExperiment instead? IMO a spectra without sample information might not be too useful.

@lgatto
Copy link
Member Author

lgatto commented Dec 12, 2022

I suppose you refer to this issue.

But we can't necessarily anticipate what the developer is sharing their data for. And your suggestion requires two inputs (and mzML and the sample annotation), and I'm not sure this fits the bill here, as the hub infrastructure is mean to share (individual) files. To fit your suggestion, we should share two files, one that could be loaded as a Spectra object directly (as per my message above) and a second one loaded as a data.frame, and both can be used to construct an MsExperiment.

@jorainer
Copy link
Member

Hm, agree - and needing two separate files would not be ideal. So, we might go for Spectra and have one Spectra object for each mzML file then?

@lgatto
Copy link
Member Author

lgatto commented Dec 12, 2022

Yes, I think that's the basic idea - I share a file and it get loaded automatically as the best object. If, as a developer, I want a Spectra object containing data from multiple files, it would be my job to create that files beforehand.

@jorainer
Copy link
Member

yes. makes sense.

@jorainer
Copy link
Member

I want a Spectra object containing data from multiple files, it would be my job to create that files beforehand.

Or simply join the Spectra from the individual files using c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants