-
Notifications
You must be signed in to change notification settings - Fork 17
Reviving load #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main-bak
Are you sure you want to change the base?
Reviving load #384
Conversation
PR Summary
Outcome: This refactoring significantly streamlines the code and improves both the utility and readability of the program. The use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Steve! I was discussing this solution with @blaiszik, and he pointed out that there wouldn't be a case where the user would want to download the dataset but not load the metadata (but there are cases where a user would want to load the metadata, but not download the dataset). He suggested the following refactor which I think is very smooth:
"I think we should just make the metadata loading occur without user intervention. It’s a very cheap operation, and I don’t see really any use to have the Foundry object without the metadata"
from foundry-ml import Foundry
f = Foundry("$DOI$") # this gets the metadata, and never gets the data
f.get_data() # this downloads the data
X,y = f.load_data() #this loads the data into RAM
I think get_data()
could also be called download_dataset()
, depending on what we think is clearest. What are your thoughts?
Updated based on feedback to reflect three possible use cases:
Also changed function names to Upgraded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need a little clarity on things before I can approve!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes Steve, almost there! Last thing for this PR: can you please remove the unnecessary added cells and remove the big outputs (ie for importing libraries, etc) from the data publishing notebook?
Also, where there anything you changed in the data publishing notebook that is supposed to be there? I didn't see anything but it's hard to tell without a proper diff.
The last thing will be to update the Foundry documentation and example notebooks with the syntax change before we can cut this release.
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #384 +/- ##
==========================================
- Coverage 72.08% 70.96% -1.13%
==========================================
Files 9 9
Lines 541 527 -14
==========================================
- Hits 390 374 -16
- Misses 151 153 +2
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Addressing #379, implemented changes to separate the loading of the metadata and dataset from the instantiation of a foundry object in order to facilitate browsing/searching of datasets prior to downloading. In addition, I refactored the downloading of metadata and data to utilize clearer verbiage when calling the functions.
Specific changes
load()
a public function againmetadata
arg fromload()
if metadata: res = metadata
fromload()
methodis_doi(name) and not metadata:
->if is_doi(name):
load()
from the__init__()
functionf.load()
->f.fetch_data()
download=True
tometadata_only=False
New usage example snippet
Documentation notes:
load_data()
loads the data into memory, whiledownload_data()
downloads it to disk, correct? If so, the docstring should probably reflect this.TODO:
__init__()
- removemetadata (dict): **For debug purposes.** A search result analog to prepopulate metadata.
load()
(nowfetch_data()
)- does a user need to pass in the 'source_id' generated bylist()
as the name argument, or can they pass in the entry under 'name'? ('name' doesn't seem to work)