-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving data #19
Comments
Broadly, the DAMA vision is derived from what the climate science community already does and has been doing for years: register "levels" of data starting from raw. There will always be exploratory "scratch" data, and users should manage that however the like, but for any derived data that is reasonably routine --- starting with basic "corrections" and common reductions --- the derived data should be re-captured in a database along with the metadata that describes how it was created. For example, a Header from the "raw" databroker (the one SIX has now) might flow into a pipeline that creates "corrected" images inserts them, with a new Header, into a separate databroker instance for corrected data. This new Header would include a pointer to the raw Header and all metadata necessary for recreating how we got from raw to corrected ("Python function X from version Y of library Z was applied with parameters {...}"). This has been tested at one or two beamlines but not widely deployed. Perhaps we could start by creating an "corrected data" databroker at SIX. |
Sounds like a very nice and powerful implementation. Which beamlines are using this model at the moment? Is there someone who can help/advise getting this off the ground? |
I believe it has been tried at CHX and maybe also at LIX. But Julien, who suddenly departed to new pastures, was running point on this, and we're still catching up on the details of what has been done.. In any case, I can take point on this to start. As may have been clear from my example above, the best of kind of derived data to capture is derived data that is obtained via a well-defined, semi-automated process. Maybe humans are tweaking some parameters here and there, but the overall process should be well-defined and reasonably stable. Is there a good candidate to start with? |
Roughly speaking there will be two steps
|
Let's start with (1). Can you provide a simple-as-possible script that looks up a header and processes the data? Then we will add code that packs the results back into our "document model" and inserts it into a second databroker. |
This is the simplest meaningful operation possible Here is one that is a bit more representative. At SIX we make two spectra per frame (which exist in different ROIs in the frame). |
Hi @danielballan Was what I provided what you needed? |
I think so, yes. I will dig in next week; this week I am occupied by the "hackathon" across the street. |
OK, I think this should be our plan:
Will try to get to (1) next week. This is my only normal workday between last week and this one. Lots of conferences. |
Sounds good. |
Update: (2) is done. Still haven't gotten to (1). |
Thanks |
@danielballan @awalter-bnl I am looking into this myself now .... Did we manage to get an analysis databroker up and running? |
We should return to the question of how we save data.
If the preference is that we use another database for derived data, it would be useful to set this up so we can start thinking about how everything can fit together.
The text was updated successfully, but these errors were encountered: