-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds script to import cctbx data to rs #264
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #264 +/- ##
==========================================
- Coverage 91.22% 88.95% -2.27%
==========================================
Files 37 40 +3
Lines 2531 2835 +304
==========================================
+ Hits 2309 2522 +213
- Misses 222 313 +91
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Test this:
Produced:
|
After thinking about it, the cctbx dependencies could be ignored, I think the refl tables can completely be accessed using msgpack
So a lightweight version of this code could be written and not need any fancy dials conda env |
Removed the CCTBX deps , so the code has slightly less functionality, but better performance. To test: conda create -n cctbx2rs python=3.12
conda activate cctbx2rs
git clone https://github.com/dermen/reciprocalspaceship.git
cd reciprocalspaceship
pip install -e .
# new deps:
pip install ray msgpack
# test
wget https://smb.slac.stanford.edu/~dermen/rs_ints.tar
tar -xvf rs_ints.tar
rs.cctbx2rs lys_si_8/ lys_si_8.mtz --ucell 78.3 78.3 36.9 90 90 90 --symbol P4 --nj 10
rs.mtzdump lys_si_8.mtz
|
this is some inspiring stuff, @dermen. as we discussed on slack, i think rs-booster is the place for command line apps like this. however, i'd like to see this refactored into an data_set = rs.read_cctbx( ... ) |
Thanks for putting this together! I agree with Kevin here -- I think it would make sense to include an |
Sounds good, ill rejig it. And yea, |
yes -- but we probably want different methods for stills and rotation method |
I just want to point out that we are using this ContextManager for Ray in the CrystFEL parser. We're using this pattern. It's something that came up with @marinegor during review of that PR. It hopefully makes it more likely that the processes get cleaned up if the main thread crashes (for instance if the user kills it with ctrl-c). It's not necessarily a dealbreaker for this PR, but I'm making a note here that we should refactor that ContextManager into a shared submodule. I made an issue for this (#267). |
@kmdalton , just saw your message.. I think I committed too quickly :) |
no problem, we can always refactor the ray stuff after this pr. i think we still haven't converged on a general policy for methods that rely on / benefit from optional dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dermen , this is really nice, readable code. Thank you! I have a few requests before we merge it though. Most importantly,
- the
rs
maintainers try to adhere to the classic unix philosophy of not printing extraneous info to console unless the user opts in. So, please put all yourprint
statements behind some kind ofverbose=False
toggle. - Your methods would be more flexible if they took filenames rather than directories. As written, it seems it would be impossible to only load a subset of the
.refl
files in a directory. This is an important feature to support. So, I'd request that you refactor all the "globbing" outside of theread_...
methods. - We have decorators (
from reciprocalspaceship.decorators import cellify, spacegroupify
) which take a range of possible input types and convert them to gemmi objects. Using these might simplify your code somewhat.
I hope these comments amount to some light refactoring. I'm happy to help if needed!
After we merge this PR, I can refactor it to use contextmanagers as we discussed.
@dermen, I like the logging changes you made. I think the code doesn't suffer at all in readability, and it might be a good practice to deploy more widely in the there are a few more things i'd like to request with this pr.
let me know if you need a hand with any of this. i can probably help out this week. thanks! |
Cool - think its mostly there ! feel free to change things as you see fit! There is no real urgency for me with this.. I am actually anticipating we need to add more options for msgpack data columns, and try/excepts where a column name doesnt exist ... |
i did some refactoring to de-emphasize the mpi idioms and make it possible to test the mpi code serially. the branch lives here. |
refactor from kmdalton
for more information, see https://pre-commit.ci
NERSC test instructions..
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dermen please review my changes and if you find them acceptable go ahead and merge this into main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we still infer the stddev dtype ? Is that ok , and should we mention in docstring ?
Ultimately that one is up to you and what you want the API to do. I would probably defer that, because it will downcast that column to float32. If we are going to infer the stddev dtype, we might as well infer things like HKL dtypes and BATCH dtype. Let me know what you think. |
I added an option to specify whether to use mtz dtype inference. but i forgot to push it 😅 |
for more information, see https://pre-commit.ci
Ok, fixed one syntax error in rs-booster code, and added a unit cell / space group estimator to rs-booster. And tested, seems to work ok:
|
Co-Authored-By: kmdalton <[email protected]>
Co-authored-by: kmdalton <[email protected]>
No description provided.