Skip to content

Commit 0453099

Browse files
blue442kjschmidt913ascourtasrepo-visualizerblaiszik
authored
Download dataset (#413)
* fixed atom finding notebook * fixed atom position finding notebook * Repo visualizer: updated diagram * Adding Zeolite OSDB example * Adding solvation energy example * updated zeolite notebook * Repo visualizer: updated diagram * found the .list() return in the code. just need to use a debugger to see actual data object * added DOI to list() method * added DOI to the list() description * added DOI to _repr_html_ need to preview it but pushing the addition for now * changed formatting a bit by adding a label for DOI * made DOI a <p> element for better spacing * replace spaces with tabs in notebooks * Repo visualizer: updated diagram * establish run() function again * add dlhub_sdk==0.10.0 to reqs, and update foundry version to 0.1.2 * delete extraneous old requirements.txt * bound globus-sdk to <=2.0.3, since Foundry isn't compatible with Globus SDK 3 yet * add dlhub to list of service used in testing * fix syntax typo in service listing * add funcx to list of services * Updating auth for tests and client creation * Repo visualizer: updated diagram * deleting dupe of python-publish.yml * add auth handling to init if authorizers not provided * update local tests to work with auth for search * remove passing decorators for tests that are local * add 'mdf' as the default index * update tests to use prod dataset * update reqs for Globus SDK 3, and version to 0.2.0 * Repo visualizer: updated diagram * add option to pass in funcx_endpoint in run() * add pip dependecy caching * test caching functionality * Add keep_hdf5 functionality to Foundry.load_data * Add Convert to Pytorch Dataset Functionality * Add Testing for toTorch function * Remove unnecessary imports * Replace keep_hdf5 with as_hdf5 * Update foundry/foundry.py * Update foundry/foundry.py * Update foundry/foundry.py * Replace keep_hdf5 with as_hdf5 * Repo visualizer: updated diagram * Update Testing and Rename Files * Fix Testing Errors * Fix a singular typo * Replace PNG with SVG * Rebase with dev * Rebase with dev * Fix first set of TODOs * Finish TODO's for Load * Fix Dataset Loading * Apply Logan's Changes * Fix Logging and Remove unnecessary code * Fix Logging and Remove unnecessary code * Add Ari's Requests. * fix unused imports, code style, and update syntax (#229) * Replace keep_hdf5 with as_hdf5 * Update foundry/foundry.py * Update foundry/foundry.py * Update foundry/foundry.py * Repo visualizer: updated diagram * fix unused imports, code style, and update syntax Co-authored-by: Aadit Ambadkar <[email protected]> Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: repo-visualizer <[email protected]> * re-add improperly removed warnings import * add Python 3.10 tests and flake8 error checking * add setup.cfg to change flake8 parameters * add comments so flake8 ignores 'unused import' err * merge tests into single file (#233) * merge tests into single file * change name of test file to match new commit * convert is_gha to a boolean for pytest skip * remove premature optimization * Reflect Changes * Fix Try Catch and Logging.log * Fix Logging * Make Logger Reflect Module Name * add dl.easy_publish wrapper function * Imports * Revert "add dl.easy_publish wrapper function" This reverts commit 7f83611. * add dl.easy_publish wrapper as f.publish_model * remove commented code and link to dlhub docs in docstring * Update testing-work.yml * Repo visualizer: updated diagram * Update README.md * Repo visualizer: updated diagram * Fix Logging * update test name * Rename testing-work.yml to tests.yml * Rapid removing of XTract (#242) * Rapid removing of XTract * Fixing as_object * Repo visualizer: updated diagram * Update setup.py * Repo visualizer: updated diagram * Update setup.py * Repo visualizer: updated diagram * end the file with a newline * remove redundant flake8 checking * fix code style (without changing functionality) * add more style fixes * fix last style error, others are covered in #231 * To tf dataset (#201) with rebase * Add Custom Dataset and Implement * Clean up Branch * Clean up Branch * Resolve Some of Logan's Changes * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Reflect Logan's Requests * Fix Import Issues * Simplify Imports * Fix Imports * Apply Logan's Changes * Comments * Refactor Common Logic Into New Function * Add Documentation * Add Documentation * Add Custom Dataset and Implement * Clean up Branch * Replace keep_hdf5 with as_hdf5 * Resolve Some of Logan's Changes * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Resolve Testing Issues? * Reflect Logan's Requests * Fix Import Issues * Simplify Imports * Fix Imports * Apply Logan's Changes * Comments * Refactor Common Logic Into New Function * Add Documentation * Add Documentation * fix reference to _get_inputs_to_targets(); also, whitespace * remove unused * import * fix test_foundry.py to have the proper tests from the dev branch * remove outdated test_to_pytorch() test * fix passing of self for _get_inputs_targets() Co-authored-by: Aristana Scourtas <[email protected]> * delete deprecated build() function and remove 'fail' language from tabular dataset reading * remove redundant path checking code for loading datasets * fix logic error in data path verification * break out path joining logic to be in scope for all dataset types * add new easy_publish parameters * set defaults for new parameters * update to version 0.3.0, add reqs for dlhub 1.0.0, update PyPI info * Repo visualizer: updated diagram * Delete bubble-vis.yml * Update README.md * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * address flake8 concerns in foundry.py * fix flake8 concerns in torch_wrapper * address flake8 concers for tf_wrapper * replace xtract module name with https one in __init__.py * transfer https methods to https_download.py * reorder private method * final flake8 changes * fix import of https_download * Add search functionality to make it easier to find datasets * Flake8 fixes * Data packages --> datasets * Dev (#255) * Fix Logging and Remove unnecessary code * Fix Logging and Remove unnecessary code * add Python 3.10 tests and flake8 error checking * add setup.cfg to change flake8 parameters * add comments so flake8 ignores 'unused import' err * Reflect Changes * Fix Try Catch and Logging.log * Fix Logging * Make Logger Reflect Module Name * Imports * Fix Logging * update test name * Rename testing-work.yml to tests.yml * end the file with a newline * remove redundant flake8 checking * fix code style (without changing functionality) * add more style fixes * fix last style error, others are covered in #231 * delete deprecated build() function and remove 'fail' language from tabular dataset reading * remove redundant path checking code for loading datasets * fix logic error in data path verification * break out path joining logic to be in scope for all dataset types * address flake8 concerns in foundry.py * fix flake8 concerns in torch_wrapper * address flake8 concers for tf_wrapper * replace xtract module name with https one in __init__.py * transfer https methods to https_download.py * reorder private method * final flake8 changes * fix import of https_download * Add search functionality to make it easier to find datasets * Flake8 fixes * Data packages --> datasets Co-authored-by: Aadit-Ambadkar <[email protected]> Co-authored-by: Isaac Darling <[email protected]> Co-authored-by: Aadit Ambadkar <[email protected]> Co-authored-by: Ben Blaiszik <[email protected]> * update version to 0.4.0 also, add Braeden as contributor * update version * Fix test badge * Create README.md * Open in Colab Buttons (#253) * Repo visualizer: updated diagram * Add Badges * Update Positioning Co-authored-by: ascourtas <[email protected]> Co-authored-by: repo-visualizer <[email protected]> * Moving some functions to utils (#262) * Moving some functions to utils * flake8 fixes * Update README.md added more to the examples readme to make it more inviting/give better context as to what this page is * Update README.md (#270) added screenshots to readme with text explainations * Update read logic (#271) * initial example for QMC ML * simplifying read logic * swap testing dataset for a smaller dataset * Added new search test. Removed some stray commented-out code. (#272) * Paralellize HTTPS downloads. Remove joblib and six requirements (#273) * Get citation function (#274) * Paralellize HTTPS downloads. Remove joblib and six requirements * Add initial bibtex citation output function * update version to 0.5.0 * Improve HTTPS downloads (#277) * Cleaned up the keyword arguments, docstring - Only one keyword argument is used, so having a large (and undocumented) flexibility with **kwargs is unneeded - Docstring style mixed NumPy and Google * Add a test requirements file, simplify test YAML Sorry, a little house maintenance while I'm at it * Fix how parallel downloads are implemented Previous version was using an executor in a way that would produce an unlimited number of threads, which can cause problems for large datasets * Make a progress bar, error checking * Removed deprecated code It will live in git and our hearts forever * Flake8 fixes * Update README.md (#275) * Add NSF badge to Foundry * updated model pub notebook (#284) * Set Header Images to an absolute URL via raw.githubusercontent (#283) * Update links to absolute URL for pypi visibility * Redirect URL to MLMI2-CSSI * Update README.md (#299) * Add new logo * Add updated logos * Update README [no ci] * Updating example logos * Remove stray print _read_json was printing a debug data frame. Removed. * Add web (#301) * Increment version for PyPI deploy * Update issue templates * Add https upload (#281) * add initial directory-making functionality * add acl permission setting * add PUT request logiv and ACL setting, plus TODOs * add logic to delete acl rule after creation * add try/except handling to acl creation * add prepare query param so we don't need to make dirs; fix bug when rule_id is not set * clean up path joining logic, as well as comments * add capability to upload all files in a folder, instead of one individual file * update endpoint destination to use a UUID as the folder name * break out acl rule adding to its own function, tidy up * break out PUT request functionality * break out upload_folder() into upload_file() and integrate https functions into publish(), with proper params * change endpoint to NCSA, make usage more modular; small os.path bug fixes * reorder functions to be easier to read * add upload capability for single file, with error handling * fix logic bugs with destination path setting s.t. all subfolders are written to destination * cleanup var names in upload_folder() logic; making endpoint_dest path more robust * code cleanup and breakout helper functions to reduce size of publish() * add parameter checks to publish() and reduce param complexity * add docstrings, plus add test param to publish() * appease flake8 * add one more flake8 fix * fix auths in tests, add system test for HTTPS publication, small comments * add system test for HTTPS upload * break out https publishing into more unit-testable method * refactor function defs to work better for testing; add https upload unit test * fix bug where artifact was written to uploaded dataset * update os.walk block comparison to be more robust * update publish() docstring and add type hints * clean up imports, fix type hint for Response, add some context for Xtract file * WIP to separate helpers into submodule -- need to fix test and method design * fix typing discrepancy for requests.Response * update modification date * Temporarily remove ACL rule creation for https upload * Fix flake8 comment error * Fix flake8 once more * Fixing local tests, flake8, kwargs * Adding test data * Debug result on GHA * Debug result on GHA * Debug result on GHA * Debug result on GHA * add Ben's patch to submodule * generalize the included functions and move make_globus_link here from foundry object * move make_globus_link function to submodule * update tests to generalized input format * properly pass 'auths' object between functions * update modification date * prepend underscore to private function * correct call to upload_to_endpoint() in foundry.py * re-add ACL rule logic * update auth passing to be more user-friendly; includes test changes * Introduce a collection to hold authorizers It uses a dataclass so that we can annotate the type of authorizers that the tuple, then document them I put it in a new module, `foundry.auth` so that it can be used by both the foundry module and the https_upload module (avoiding circular dependencies) * alter args such that it's not possible for the user to have endpoint_id and gcs_auth_client misalign * change language to endpoint_auth_clients for clarity of purpose * docstring updates --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]> * https upload bugfix (#322) * add initial directory-making functionality * add acl permission setting * add PUT request logiv and ACL setting, plus TODOs * add logic to delete acl rule after creation * add try/except handling to acl creation * add prepare query param so we don't need to make dirs; fix bug when rule_id is not set * clean up path joining logic, as well as comments * add capability to upload all files in a folder, instead of one individual file * update endpoint destination to use a UUID as the folder name * break out acl rule adding to its own function, tidy up * break out PUT request functionality * break out upload_folder() into upload_file() and integrate https functions into publish(), with proper params * change endpoint to NCSA, make usage more modular; small os.path bug fixes * reorder functions to be easier to read * add upload capability for single file, with error handling * fix logic bugs with destination path setting s.t. all subfolders are written to destination * cleanup var names in upload_folder() logic; making endpoint_dest path more robust * code cleanup and breakout helper functions to reduce size of publish() * add parameter checks to publish() and reduce param complexity * add docstrings, plus add test param to publish() * appease flake8 * add one more flake8 fix * fix auths in tests, add system test for HTTPS publication, small comments * add system test for HTTPS upload * break out https publishing into more unit-testable method * refactor function defs to work better for testing; add https upload unit test * fix bug where artifact was written to uploaded dataset * update os.walk block comparison to be more robust * update publish() docstring and add type hints * clean up imports, fix type hint for Response, add some context for Xtract file * WIP to separate helpers into submodule -- need to fix test and method design * fix typing discrepancy for requests.Response * update modification date * Temporarily remove ACL rule creation for https upload * Fix flake8 comment error * Fix flake8 once more * Fixing local tests, flake8, kwargs * Adding test data * Debug result on GHA * Debug result on GHA * Debug result on GHA * Debug result on GHA * add Ben's patch to submodule * generalize the included functions and move make_globus_link here from foundry object * move make_globus_link function to submodule * update tests to generalized input format * properly pass 'auths' object between functions * update modification date * prepend underscore to private function * correct call to upload_to_endpoint() in foundry.py * re-add ACL rule logic * update auth passing to be more user-friendly; includes test changes * Introduce a collection to hold authorizers It uses a dataclass so that we can annotate the type of authorizers that the tuple, then document them I put it in a new module, `foundry.auth` so that it can be used by both the foundry module and the https_upload module (avoiding circular dependencies) * alter args such that it's not possible for the user to have endpoint_id and gcs_auth_client misalign * change language to endpoint_auth_clients for clarity of purpose * docstring updates * fix bug from last round of review edits --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]> * add static badge with link to gitbook (#333) * Update publishing notebook and minor bugfixes (#336) * update publishing notebook example to use HTTPS upload primarily, along with minor fixes * add https upload methods and data * fix function call to publish_dataset * remove ACL rule code to fix error issue * update globus images in notebook * remove commented code * add missing scopes * appease flake overlords * Add search lambda authorizer (sl_authorizer) to dlhub_client instantiation * removed unnecessary scopes * update curation info in notebook --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]> Co-authored-by: Eric Blau <[email protected]> * delete due to unresolvable merge conflict -- adding back in dev * add back publishing notebook * Split specification (#344) * adding ability to specify splits for loading * refining test * Update splits_to_load --> splits --------- Co-authored-by: blaiszik <[email protected]> * Implementing new search() function (and refactor) (#408) * update to 0.6.0 for HTTPS pub * Upload Foundry class load() function to default download using https (#340) * Update setup.py Fix version number for pyPI deploy * Update setup.py version for pyPI * Update requirements.txt to latest DLHub SDK This is needed to require upgrade of DLHub SDK for Foundry users when they upgrade Foundry. * Update version to 0.6.3 * incorporate load() to foundry.__init__() * automating api documentation using github action (#342) * CI: Automated documentation build * Removing remnants of XTract * CI: Automated documentation build * CI: Automated documentation build * Update README.md with contributing instructions (#357) * Update README.md with contributing instructions * Update PR language * merging in split specification * flake fixes * add jingrui examples (#363) * removed blank line * Load on init (#358) * incorporate load() to foundry.__init__() * merging in split specification * flake fixes * removed blank line * Adds note for quickstart globus set to false * Validating metadata before publishing * remove arguments from Foundry object that are duplicated with base class * Update setup.py version to 0.7.0 * refactor foundry to separate foundry instance from dataset objects * fine tuning search functionality * removing redefinition of FoundryDataset * address comments in PR review * remove unused import * updgrade setup-python from v2 to v4 * upgrade other setup-python from v2 to v4 * modify limit test --------- Co-authored-by: ascourtas <[email protected]> Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Marshall McDonnell <[email protected]> * CI: Automated documentation build * getting cache sorted * getting cache sorted * updated docstrings and examples, added extended pandas dataframe class for display and access of datasets * updated docstrings and examples, added extended pandas dataframe class for display and access of datasets * cleaning up flake8 issues * cleaning up flake8 issues * cleaning up a few loose ends * cleaning up a few loose ends * check if download is causing test in GHA to fail * check if download is causing test in GHA to fail * disabling all individual tests * disabling all individual tests * testing a few tests * testing a few tests * testing a few more tests * testing a few more tests * testing w/ globus=False * testing w/ globus=False * WIP * WIP * refactoring tests * refactoring tests * wrapping up testing of foundry_cache.py * wrapping up testing of foundry_cache.py * refining tests * refining tests * tighten up testing, remove downloads from GHA, hold off on https download until next issue * updating workflow * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * removing test for GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * skipping problematic test in GHA * testing bump of pytest version * reverting to skip test that uses pytest.raises in GHA * addressing feedback from PR --------- Co-authored-by: KJ <[email protected]> Co-authored-by: ascourtas <[email protected]> Co-authored-by: repo-visualizer <[email protected]> Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: BraedenCu <[email protected]> Co-authored-by: Aadit-Ambadkar <[email protected]> Co-authored-by: Aadit Ambadkar <[email protected]> Co-authored-by: Isaac Darling <[email protected]> Co-authored-by: Logan Ward <[email protected]> Co-authored-by: C. Y. Schneck <[email protected]> Co-authored-by: Sterling G. Baird <[email protected]> Co-authored-by: Logan Ward <[email protected]> Co-authored-by: Eric Blau <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Marshall McDonnell <[email protected]>
1 parent 8065b6f commit 0453099

34 files changed

+4471
-3154
lines changed

.github/workflows/tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ jobs:
2020
CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}
2121
name: build
2222
steps:
23-
- uses: actions/checkout@v2
23+
- uses: actions/checkout@v4
2424
- name: Set up Python ${{ matrix.python-version }}
25-
uses: actions/setup-python@v4
25+
uses: actions/setup-python@v5
2626
with:
2727
python-version: ${{ matrix.python-version }}
2828
cache : 'pip'
@@ -46,7 +46,7 @@ jobs:
4646
4747
- name: Test with pytest
4848
run: |
49-
pytest -s tests/test_foundry.py --cov=./foundry --cov-report=xml
49+
pytest -s -v tests/ --cov=./foundry --cov-report=xml
5050
- name: Upload coverage to Codecov
5151
run: |
5252
curl -Os https://uploader.codecov.io/v0.1.0_4653/linux/codecov

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
*.DS_STORE
33
*.pyc
44
*.idea
5-
*/foundry_ml.egg-info/*
5+
*/foundry_ml.egg-info/*
6+
globus_creds

data/https_test/test_data.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

examples/DefectTrack/000001.png

-451 KB
Binary file not shown.

examples/DefectTrack/000002.png

-451 KB
Binary file not shown.

examples/DefectTrack/000003.png

-451 KB
Binary file not shown.

examples/DefectTrack/000004.png

-451 KB
Binary file not shown.

0 commit comments

Comments
 (0)