Add SpectrumParser virtual class #5

jorainer · 2019-04-23T08:37:51Z

This is a small proposal to help unifying the API for the spectrum parsers. The idea is to have a virtual class SpectrumParser with a defined set of methods. Parser implementations should extend this class and implement all of its methods.

We would have to agree also on a common set of methods and its input and result arguments. I've added one method so far, but more (or different ones) might be helpful.

importSpectrum takes a SpectrumParser object and a file, reads its content, maps its properties to the standard fields (using the schema) and returns a Spectra object.

If this method or the return type makes sense needs to be discussed further. I just wanted to propose one way how a single API for the user could be defined that uses different parsers to read data from different sources.

An example workflow to read data from a single file would be the following (based on the toy MassbankParser object extending SpectrumParser I've implemented):

library(MSnio)
## Create and intialize a parser; this should read/import the corresponding schema
prs <- MassbankParser()
## Spectrum file
fl <- system.file("spectra/massbank/EA016614.txt", package="MSnio")
res <- importSpectrum(prs, fl)

## res should then ideally be a `Spectra` object - or the new `Spectrum` object.

Let's discuss now.

- Add a first draft of a SpectrumParser class to define an initial API for spectrum data import (issue meowcat#4).

jorainer · 2019-04-23T08:38:36Z

Just realized I can't request any reviewers - so please @meowcat , @michaelwitting @Treutler , have a look at it and let's discuss.

meowcat · 2019-04-23T08:54:29Z

Thanks for the input. It looks good already, but do we want a SpectrumParser or do we want a more general FormatHandler (or whatever) which has both import (parser) and export (writer)?

meowcat · 2019-04-23T09:14:45Z

We can of course have them separate, but ideally for any format the Parser and the Writer implementation would successfully round-trip (and we could use this as a unit test) and ideally also use the same schema definitions...

jorainer · 2019-04-23T09:23:25Z

Good point! I just don't like the name yet. What about SpectrumRecord and e.g. MassbankRecord?

meowcat · 2019-04-23T12:26:32Z

I also don't like the name :) But Record is also so-so because it describes the entity of a record more than the facility to handle it.

By the way, for this class I suggest we also specify something like the "subset of fields it can successfully round-trip", because not all formats will be able to handle everything lossless for import and export (even though we would like it). The validation / unittest procedure would then possibly be:

Take a sample record
crop it to supported fields
export it
(check the exported file)
import it back
check the identity of the re-imported record to the cropped original record.

jorainer · 2019-04-23T12:37:06Z

An alternative would be to set all the not-supported fields to NA. That's at least what we have e.g. in mzR as not all input files provide all spectrum metadata.

meowcat · 2019-04-23T12:51:01Z

A field can be NA if there is no data in the file, regardless of whether the file format actually supports the field or not. E.g. for MassBank records there is a vast amount of optional fields. I think specifying the supported fields per file format will be important when it comes to reproducibility/testing.

jorainer · 2019-04-23T12:53:18Z

Agree. And that's exactly the point. I will try to define a schema for hmdb an importer, unit tests etc and then make a pull request (or actually change this one).

jorainer · 2019-04-23T13:05:40Z

Name: SpectrumAccessor, MassbankAccessor???

Treutler · 2019-04-23T13:49:47Z

Our FormatAdapter (or whatever) is similar to a BridgeDb IDMapper. A FormatAdapter could have

a supported record format
a set of supported input fields (the fields from the general format which are readable from file)
a set of supported output fields (the fields from the general format which are writeable to file)
any other format-specific information we want

The FormatAdapter would than do the import and export and map the fields from file to the fields from the general format and vice versa.

meowcat · 2019-04-23T14:11:27Z

Other name ideas:
SpectrumTranslator
SpectrumConverter
But @jorainer for now you can use what you like best, changing the name will be the smallest problem if we don't like it.

Add SpectrumParser virtual class

7aa0326

- Add a first draft of a SpectrumParser class to define an initial API for spectrum data import (issue meowcat#4).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SpectrumParser virtual class #5

Add SpectrumParser virtual class #5

jorainer commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

jorainer commented Apr 23, 2019

Treutler commented Apr 23, 2019 •

edited

Loading

meowcat commented Apr 23, 2019

Add SpectrumParser virtual class #5

Are you sure you want to change the base?

Add SpectrumParser virtual class #5

Conversation

jorainer commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

meowcat commented Apr 23, 2019

jorainer commented Apr 23, 2019

jorainer commented Apr 23, 2019

Treutler commented Apr 23, 2019 • edited Loading

meowcat commented Apr 23, 2019

Treutler commented Apr 23, 2019 •

edited

Loading