-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing DataFrames as args, no architecture format (i.e. non-platform specific), black code style, removal of large output data, backwards compatible #16
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey Anthony,
I'm opening this huge pull request because I've been sitting on a bunch of changes to CrabNet and figured it would be better to get some discussion started sooner than later. To give some background, the fork exists as a submodule for mat_discover. As the title describes, the core functionality no longer depends on relative or absolute file paths from the user's perspective (#11) and instead follows an
automatminer
-style format that I think is more portable (while still maintaining backwards compatibility). I replaced a number of things that were platform-specific (usually related to file paths), reformatted the repo in black code style, and removed a large portion of output data that pushed the combined, compressed file size ofmat_discover
,CrabNet
, andElM2D
over the Test PyPI upload limit (100 MB). I refactoredmat_discover
to the point of being pip-installable and am almost done with the CI workflow, so CrabNet should also be ready to deploy standalone after some minimal changes related to import statements (i.e. replacemat_discover.CrabNet
withCrabNet
). I've been usingflit
to deal with PyPI and it's been fairly user-friendly.As I mentioned, I made a large effort to keep it backwards compatible with the original CrabNet instructions. In other words, you should still be able to use a
materials_data/data/<property>
folder withtrain.csv
,val.csv
, andtest.csv
per the README instructions. Right now, thefit()
andpredict()
style methods (#14) look like:I think it would still be nice to get these into a more standard
sklearn
-esque format (e.g./i.e. wrapget_model
with amdl = Model.fit(train_df)
method andval_pred = mdl.predict(val_df)
with an optional kwarg for outputting the uncertainty).Replacing the output
csv
files might bring it below the 100 MB compressed limit for Test PyPI, but the CrabNet repo is pretty large as is, so I think it might be better to put this output data on figshare and give instructions or a script for grabbing the data. I also implemented a function inmat_discover
such that you can grab CrabNet data from wherever usingopen_text
even after it's been installed viapip
with some optional sugar for dividing into train/val or train/val/test splits.Feel free to push back on any of these changes. I mainly made these changes for
mat_discover
and the things that come next. Interested to hear your thoughts and let me know via Slack if you want to meet over Zoom to discuss or see some of the functionality in action.Sterling