Saving data #19

mpmdean · 2018-07-30T21:16:47Z

We should return to the question of how we save data.

If the preference is that we use another database for derived data, it would be useful to set this up so we can start thinking about how everything can fit together.

danielballan · 2018-08-02T17:35:51Z

Broadly, the DAMA vision is derived from what the climate science community already does and has been doing for years: register "levels" of data starting from raw. There will always be exploratory "scratch" data, and users should manage that however the like, but for any derived data that is reasonably routine --- starting with basic "corrections" and common reductions --- the derived data should be re-captured in a database along with the metadata that describes how it was created. For example, a Header from the "raw" databroker (the one SIX has now) might flow into a pipeline that creates "corrected" images inserts them, with a new Header, into a separate databroker instance for corrected data. This new Header would include a pointer to the raw Header and all metadata necessary for recreating how we got from raw to corrected ("Python function X from version Y of library Z was applied with parameters {...}").

This has been tested at one or two beamlines but not widely deployed. Perhaps we could start by creating an "corrected data" databroker at SIX.

mpmdean · 2018-08-02T17:41:56Z

Sounds like a very nice and powerful implementation. Which beamlines are using this model at the moment?

Is there someone who can help/advise getting this off the ground?

danielballan · 2018-08-02T19:07:07Z

I believe it has been tried at CHX and maybe also at LIX. But Julien, who suddenly departed to new pastures, was running point on this, and we're still catching up on the details of what has been done.. In any case, I can take point on this to start.

As may have been clear from my example above, the best of kind of derived data to capture is derived data that is obtained via a well-defined, semi-automated process. Maybe humans are tweaking some parameters here and there, but the overall process should be well-defined and reasonably stable. Is there a good candidate to start with?

mpmdean · 2018-08-02T19:20:14Z

Roughly speaking there will be two steps

each image will go into sixtools.rixswrapper.image_to_spectrum, which will convert it to a spectrum (i.e. two-column pixel/energy versus intensity). This is a reasonably well-defined process, although we need to be able to re-do this if needed.
After that individual spectra will need to be combined. This will often require more manual intervention e.g. plotting and choosing which spectra picking spectra from different scan_ids etc.

danielballan · 2018-08-02T20:29:08Z

Let's start with (1). Can you provide a simple-as-possible script that looks up a header and processes the data? Then we will add code that packs the results back into our "document model" and inserts it into a second databroker.

mpmdean · 2018-08-02T21:20:32Z

This is the simplest meaningful operation possible
https://nbviewer.jupyter.org/gist/BNL-XRG/1476e8e3f3a0ee44be24c8ab8533aa79

Here is one that is a bit more representative. At SIX we make two spectra per frame (which exist in different ROIs in the frame).
https://nbviewer.jupyter.org/gist/BNL-XRG/6462fec65e3492c0b2b4643a1175c5b8

mpmdean · 2018-08-13T16:48:42Z

Hi @danielballan Was what I provided what you needed?

danielballan · 2018-08-15T15:20:43Z

I think so, yes. I will dig in next week; this week I am occupied by the "hackathon" across the street.

danielballan · 2018-08-20T20:45:37Z

OK, I think this should be our plan:

I wrap this code in from-and-to document model code.
During the upcoming downtime, DAMA deploys a second databroker instance for processed data at SIX.

Will try to get to (1) next week. This is my only normal workday between last week and this one. Lots of conferences.

mpmdean · 2018-08-22T03:05:05Z

Sounds good.

danielballan · 2018-09-18T21:51:00Z

Update: (2) is done. Still haven't gotten to (1).

mpmdean · 2018-09-19T00:46:38Z

Thanks

stuwilkins · 2019-01-10T18:27:13Z

@danielballan @awalter-bnl I am looking into this myself now .... Did we manage to get an analysis databroker up and running?

mpmdean assigned awalter-bnl Jul 30, 2018

mpmdean assigned danielballan Sep 27, 2018

stuwilkins self-assigned this Jan 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving data #19

Saving data #19

mpmdean commented Jul 30, 2018

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018 •

edited

Loading

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018

mpmdean commented Aug 13, 2018

danielballan commented Aug 15, 2018

danielballan commented Aug 20, 2018

mpmdean commented Aug 22, 2018

danielballan commented Sep 18, 2018

mpmdean commented Sep 19, 2018

stuwilkins commented Jan 10, 2019

Saving data #19

Saving data #19

Comments

mpmdean commented Jul 30, 2018

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018 • edited Loading

danielballan commented Aug 2, 2018

mpmdean commented Aug 2, 2018

mpmdean commented Aug 13, 2018

danielballan commented Aug 15, 2018

danielballan commented Aug 20, 2018

mpmdean commented Aug 22, 2018

danielballan commented Sep 18, 2018

mpmdean commented Sep 19, 2018

stuwilkins commented Jan 10, 2019

mpmdean commented Aug 2, 2018 •

edited

Loading