-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement SSX fundamentals #192
Comments
I suppose that where it makes sense, we would like to map these values onto existing tables and columns, i.e. re-use what we already have in the schema. But we shouldn't shy away from creating new tables and columns either, if that makes more sense. The "Experiment/BL parameters" values could possibly go into existing and new columns in Many of the "Data collection" values map onto existing columns in the Sample/crystal slurry: I wonder if maybe a new table is needed for this? I suppose the Protein value should just be a proteinId FK pointing to a row in the "Crystfel_processing", "PyFAI_parameters" and "Crystfel_merging" all sound like different steps in a processing pipeline. At Diamond, we would perhaps consider putting those values into the tables (My understanding of SSX is not great so far, so I'll have to come back to this later when I know more ...) |
The idea we had was to use existing tables as much as possible when they match and then extend them with new tables dedicated to SSX when extra colums are required. The goal is to avoid adding even more columns to some already big and not very well documented tables, which would make them harder to understand and maintain.
I think we will indeed need a new At first, we will focus on the experiment, sample and data collection part of the API, and in a second step come back to the processing API. Anyways, this is a first and very early stage version of the requirements. A lot of elements still need to be cleared up with the scientists, and we also need to get a better understanding of SSX with their help! |
I think we should probably try to discuss this in a dev meeting before development starts, did the document come from ESRF internal meetings? In previous ISPyB meetings we had discussed using There are certainly questions about sample definition and linkages there. |
I remember quite a well developed prototype data model for ISPyB SSX tables, has that work now been abandoned? |
We have Looks like experiment / bl parameters should be merged into Looking more closely, almost everything already exists... Experiment_type (chip/injector) -> Exp.time -> When its comes to sample definition i think we should probably avoid linkages to SAXS tables (as this could be refactored, long in the future) Protein Then potentially What this ultimately means is that you dont need to create an SSX datacollection end point, you can use the existing multi-discipline |
I'll come back to processing, crystalfel processing is basically integration, so it should map onto Not sure where to store the |
Looks good, i think you can probably merge
|
Yes, sounds like a good idea to simplify things! |
I think we can probably generalise |
Heres how we can handle creating complex samples with the core tables (without needing to link to the BioSAXS tables) Some other thoughts: Historically some developments have created new tables / resources per discipline. This makes it hard to share features. Instead we should try to create tables that support We should avoid creating yet another datacollections resource. This should be handled by the Try to keep to naming standards. We dont need all the sample information with the dc, this information can be queried separately. Split queries and add link in the ui from the dc page to a sample page. We traditionally just return the sample name, and protein acronym with the dc. Now we only have 3 missing dc and 1 missing sample column, do we need new tables? SSX/DataCollection: SSX/BLSample: |
Outdated - jump to #192 (comment) Here is the new iteration of the schema we intend to use for handling SSX data collections and samples. Existing tables are on the left side and new tables on the right side. The idea here is to have something generic, without trying to force everything into the existing and not always appropriate tables (i.e. the New features include:
Here is how an SSX data collection would fit in this schema: Each
Then we need to know when that new ligand was mixed with the sample so that we can figure out how long it has been in contact with the protein when each image was taken. For that we use the This sequence could also be used for laser excitation before image acquisition but this use case is not well defined yet so should be discussed later. |
SampleEvent seems to be per-image; does that mean (potentially) millions of rows per data session? I was surprised for similar reasons that |
Is |
Are all of these 1-to-many relationships actually 1-to-many, or are some of them intended to be 1-to-1 in practice? E.g. if |
Both composition tables are 1-to-many Crystal -> CrystalComposition -> Component allows: Then during an experiment you maybe add one or more ligand, additive, etc: You're right though, maybe a link here to |
Yes, millions of rows for each session would be problematic. It is still a bit unclear what will be required on the processing side, and whether we need to have a record for each individual image or not. This schema also allows to save a |
Thanks, I see - so would a single |
From my
From my vague understanding (@mgaonach knows more) this is more specifically for jet based mixing, rather than chips |
I think the comment was that if there isn't a link to/from SampleComposition then it doesn't make sense - the only way to walk to the SampleComposition would be the one through DataCollectionGroup/Sample (in which case no metainformation to tell which one you used). So, a link from SSXCollection to sample composition would seem to make sense.
Additional queries:
|
I'm not sure this works with the table as-is - it has a timestamp field, which implies a global fixed point in time (to match an image). I could imagine being more useful as e.g. a "Plan" - I'm not sure how often it's a simple single pulse with delay and not a structure of multiple pulses (which could be a reason for multiple SequenceEvents per collection).
I'm not sure how much it's worth planning fixed-in-stone data structures if we don't know what they are supposed to be used for or what the use cases are. That seems like it will inevitably end up with something complicated to implement because it's trying to be generic, but not generic enough to be useful. |
I am not enterely sure to understand, but I think it comes to something unclear in the schema: the relation between
In case of a single event (for instance ligand mixed) then the timestamp is the time at which this event happened. In the case of a repeated event then the timestamp is the first iteration and we can get the others with
This is a prototype and in no way fixed in stone. As of now we have good idea of what needs to be saved and we're iterating solutions as it gets refined. This can - and definitely will - be modified before ending up in production. |
The relationship from
Now, here, a If Is the idea that, a "
"first iteration" is still per-frame, right? A single image might have several pulses of the laser, or a light pulse and then a defined waiting time and then a substrate injection. Or the idea was that the timestamp is the start of the collection delay is set the same as
Okay - so, what led it to be in this shape; what problem is it solving, or what experiment has it been added for? What is it trying to represent? Maybe this is clear to everyone else, but for me to understand, It might help to have some specific "story" examples of "A serial experiment" and specifically what constraints those have imposed on the complexity here. That might invert the problem from "What is the purpose of this table and it's constraints?" to "This complexity is required to solve problem X" |
Outdated - jump to #192 (comment) Here is a corrected version of the schema. I fixed the 1-to-many relations that were actually 1-to-1 and removed the missleading link to Image (this part was a very rought idea waiting for further requirements, the idea behind this prototype was focused on the sample part). Here is how an SSX data collection would fit in this schema: Each
Then we need to know when that new ligand was mixed with the sample so that we can figure out how long it has been in contact with the protein when each image was taken. For that we use the This sequence is also used to describe data acquisition steps with a set of |
Updated previous comment with latest data model. For UI, please refer to ispyb/py-ispyb-ui#15. |
If the relationship between The same goes for the relationship between |
And yes, please consider using I'm a little bit concerned about the potential size of the Re: the naming of Re: the name of |
That's a very good point! I will update the model with that.
I think this table is not in the schema we have. But I agree this enum is not very convenient. We should look into it.
Yes, I was hesitating between both of these options. It is currently used for the unit cell parameters frequency graph (which are divided in 100 bins = 100 rows per graph - I think we could reduce that amount of bins) and we have 6 parameters so 6 graphs = 600 rows per data collection. I went with database storage to see how it works but if you have any experience/input on which would be best it would be very welcome.
Sounds good to me. Getting rid of
Yes, I was thinking of renaming it to Thank you very much for your feedback! |
You should also link |
Thanks for the comprehensive description! One thing that is still not entirely clear from your description is whether a single What does |
The way I thought about it is that a As for absolute |
SSX_ispyb.docx provides a list of the fundamental values that we would need to store for SSX experiments. This list can be used as a starting point to build a first SSX prototype in py-ISPyB.
The text was updated successfully, but these errors were encountered: