Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving data #19

Open
mpmdean opened this issue Jul 30, 2018 · 13 comments
Open

Saving data #19

mpmdean opened this issue Jul 30, 2018 · 13 comments
Assignees

Comments

@mpmdean
Copy link
Contributor

mpmdean commented Jul 30, 2018

We should return to the question of how we save data.

If the preference is that we use another database for derived data, it would be useful to set this up so we can start thinking about how everything can fit together.

@danielballan
Copy link
Contributor

Broadly, the DAMA vision is derived from what the climate science community already does and has been doing for years: register "levels" of data starting from raw. There will always be exploratory "scratch" data, and users should manage that however the like, but for any derived data that is reasonably routine --- starting with basic "corrections" and common reductions --- the derived data should be re-captured in a database along with the metadata that describes how it was created. For example, a Header from the "raw" databroker (the one SIX has now) might flow into a pipeline that creates "corrected" images inserts them, with a new Header, into a separate databroker instance for corrected data. This new Header would include a pointer to the raw Header and all metadata necessary for recreating how we got from raw to corrected ("Python function X from version Y of library Z was applied with parameters {...}").

This has been tested at one or two beamlines but not widely deployed. Perhaps we could start by creating an "corrected data" databroker at SIX.

@mpmdean
Copy link
Contributor Author

mpmdean commented Aug 2, 2018

Sounds like a very nice and powerful implementation. Which beamlines are using this model at the moment?

Is there someone who can help/advise getting this off the ground?

@danielballan
Copy link
Contributor

I believe it has been tried at CHX and maybe also at LIX. But Julien, who suddenly departed to new pastures, was running point on this, and we're still catching up on the details of what has been done.. In any case, I can take point on this to start.

As may have been clear from my example above, the best of kind of derived data to capture is derived data that is obtained via a well-defined, semi-automated process. Maybe humans are tweaking some parameters here and there, but the overall process should be well-defined and reasonably stable. Is there a good candidate to start with?

@mpmdean
Copy link
Contributor Author

mpmdean commented Aug 2, 2018

Roughly speaking there will be two steps

  1. each image will go into sixtools.rixswrapper.image_to_spectrum, which will convert it to a spectrum (i.e. two-column pixel/energy versus intensity). This is a reasonably well-defined process, although we need to be able to re-do this if needed.

  2. After that individual spectra will need to be combined. This will often require more manual intervention e.g. plotting and choosing which spectra picking spectra from different scan_ids etc.

@danielballan
Copy link
Contributor

Let's start with (1). Can you provide a simple-as-possible script that looks up a header and processes the data? Then we will add code that packs the results back into our "document model" and inserts it into a second databroker.

@mpmdean
Copy link
Contributor Author

mpmdean commented Aug 2, 2018

This is the simplest meaningful operation possible
https://nbviewer.jupyter.org/gist/BNL-XRG/1476e8e3f3a0ee44be24c8ab8533aa79

Here is one that is a bit more representative. At SIX we make two spectra per frame (which exist in different ROIs in the frame).
https://nbviewer.jupyter.org/gist/BNL-XRG/6462fec65e3492c0b2b4643a1175c5b8

@mpmdean
Copy link
Contributor Author

mpmdean commented Aug 13, 2018

Hi @danielballan Was what I provided what you needed?

@danielballan
Copy link
Contributor

I think so, yes. I will dig in next week; this week I am occupied by the "hackathon" across the street.

@danielballan
Copy link
Contributor

OK, I think this should be our plan:

  1. I wrap this code in from-and-to document model code.
  2. During the upcoming downtime, DAMA deploys a second databroker instance for processed data at SIX.

Will try to get to (1) next week. This is my only normal workday between last week and this one. Lots of conferences.

@mpmdean
Copy link
Contributor Author

mpmdean commented Aug 22, 2018

Sounds good.

@danielballan
Copy link
Contributor

Update: (2) is done. Still haven't gotten to (1).

@mpmdean
Copy link
Contributor Author

mpmdean commented Sep 19, 2018

Thanks

@stuwilkins
Copy link

@danielballan @awalter-bnl I am looking into this myself now .... Did we manage to get an analysis databroker up and running?

@stuwilkins stuwilkins self-assigned this Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants