Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpectrumParser virtual class #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jorainer
Copy link
Collaborator

This is a small proposal to help unifying the API for the spectrum parsers. The idea is to have a virtual class SpectrumParser with a defined set of methods. Parser implementations should extend this class and implement all of its methods.

We would have to agree also on a common set of methods and its input and result arguments. I've added one method so far, but more (or different ones) might be helpful.

  • importSpectrum takes a SpectrumParser object and a file, reads its content, maps its properties to the standard fields (using the schema) and returns a Spectra object.

If this method or the return type makes sense needs to be discussed further. I just wanted to propose one way how a single API for the user could be defined that uses different parsers to read data from different sources.

An example workflow to read data from a single file would be the following (based on the toy MassbankParser object extending SpectrumParser I've implemented):

library(MSnio)
## Create and intialize a parser; this should read/import the corresponding schema
prs <- MassbankParser()
## Spectrum file
fl <- system.file("spectra/massbank/EA016614.txt", package="MSnio")
res <- importSpectrum(prs, fl)

## res should then ideally be a `Spectra` object - or the new `Spectrum` object.

Let's discuss now.

- Add a first draft of a SpectrumParser class to define an initial API for
  spectrum data import (issue meowcat#4).
@jorainer
Copy link
Collaborator Author

Just realized I can't request any reviewers - so please @meowcat , @michaelwitting @Treutler , have a look at it and let's discuss.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

Thanks for the input. It looks good already, but do we want a SpectrumParser or do we want a more general FormatHandler (or whatever) which has both import (parser) and export (writer)?

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

We can of course have them separate, but ideally for any format the Parser and the Writer implementation would successfully round-trip (and we could use this as a unit test) and ideally also use the same schema definitions...

@jorainer
Copy link
Collaborator Author

Good point! I just don't like the name yet. What about SpectrumRecord and e.g. MassbankRecord?

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

I also don't like the name :) But Record is also so-so because it describes the entity of a record more than the facility to handle it.

By the way, for this class I suggest we also specify something like the "subset of fields it can successfully round-trip", because not all formats will be able to handle everything lossless for import and export (even though we would like it). The validation / unittest procedure would then possibly be:

  • Take a sample record
  • crop it to supported fields
  • export it
  • (check the exported file)
  • import it back
  • check the identity of the re-imported record to the cropped original record.

@jorainer
Copy link
Collaborator Author

An alternative would be to set all the not-supported fields to NA. That's at least what we have e.g. in mzR as not all input files provide all spectrum metadata.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

A field can be NA if there is no data in the file, regardless of whether the file format actually supports the field or not. E.g. for MassBank records there is a vast amount of optional fields. I think specifying the supported fields per file format will be important when it comes to reproducibility/testing.

@jorainer
Copy link
Collaborator Author

Agree. And that's exactly the point. I will try to define a schema for hmdb an importer, unit tests etc and then make a pull request (or actually change this one).

@jorainer
Copy link
Collaborator Author

Name: SpectrumAccessor, MassbankAccessor???

@Treutler
Copy link
Collaborator

Treutler commented Apr 23, 2019

Our FormatAdapter (or whatever) is similar to a BridgeDb IDMapper. A FormatAdapter could have

  • a supported record format
  • a set of supported input fields (the fields from the general format which are readable from file)
  • a set of supported output fields (the fields from the general format which are writeable to file)
  • any other format-specific information we want

The FormatAdapter would than do the import and export and map the fields from file to the fields from the general format and vice versa.

@meowcat
Copy link
Owner

meowcat commented Apr 23, 2019

Other name ideas:
SpectrumTranslator
SpectrumConverter
But @jorainer for now you can use what you like best, changing the name will be the smallest problem if we don't like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants