You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we discussed extensively on the DIALS Slack channel, it is now relatively easy to parse DIALS .refl files without cctbx/DIALS. Newer versions of DIALS encode reflection tables using msgpack which seems a relatively innocuous dependency to add.
To this end @ndevenish has built a parser that decodes refl tables using numpy. It's nearly complete but may be missing column types. We can find a full list of types in this block. It should be easy to build this into the rs.io submodule as I've done here for example.
There remains the issue of DIALS reflection tables potentially containing some fairly exotic objects (shoeboxes, vectors, matrices). The safest (sadly slowest) thing to do for a first pass is to just default them to objects. We can think about clever solutions later.
Parsing legacy pickle based reflection tables is an open question. For the time being, I think we just can't support them. @ndevenish suggests looking here for clues though.
@JBGreisman, let's chat about this early next week and get it up and running. I think this is already mostly there!
The text was updated successfully, but these errors were encountered:
I've updated it to handle the missing types, and a basic pytest test (it can be run refl_loader.py --write-test inside a cctbx environment to write it's test file). std::string is also supposed to be handled by the msgpack writer but is rather broken - dials/dials#1858 - so am pretty sure that's not "in the wild" anywhere.
This looks like a pretty solid start. There are a few columns that can be stored in a DataFrame/DataSet, but cannot be written to an MTZ file. I think a decision we will have to make is whether those columns should be skipped, or whether they should be parsed and included in the DataSet.
It is always possible to add new dtypes if that would improve the behavior of anything (for strings, pandas already has us covered: StringDtype). It is also possible to add multidimensional numpy arrays as columns, but they seem to really end up more as "lists of arrays"
As we discussed extensively on the DIALS Slack channel, it is now relatively easy to parse DIALS
.refl
files withoutcctbx/DIALS
. Newer versions of DIALS encode reflection tables usingmsgpack
which seems a relatively innocuous dependency to add.To this end @ndevenish has built a parser that decodes refl tables using
numpy
. It's nearly complete but may be missing column types. We can find a full list of types in this block. It should be easy to build this into thers.io
submodule as I've done here for example.There remains the issue of DIALS reflection tables potentially containing some fairly exotic objects (shoeboxes, vectors, matrices). The safest (sadly slowest) thing to do for a first pass is to just default them to objects. We can think about clever solutions later.
Parsing legacy
pickle
based reflection tables is an open question. For the time being, I think we just can't support them. @ndevenish suggests looking here for clues though.@JBGreisman, let's chat about this early next week and get it up and running. I think this is already mostly there!
The text was updated successfully, but these errors were encountered: