-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A first pass at storage #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time today to read in depth, but will do so before our meeting tomorrow.
One thing that occurs to me right now: to_storage/from_storage is very similar to the Storable interface. I guess implementing restore
would be more difficult in your setup because you wanted the objects to be live before populating from storage, but it would make other things much easier (no need to keep track of class names in storage, it already saves a simple version tag). If this is a blocker I would rather discuss what needs to change in Storable to enable it to be used here.
Super. Indeed, if the academic agenda permits, spending time talking about this stuff live would be both enjoyable and helpful!
So there are two fundamental objects we're storing: For
|
569cd35
to
bd62c77
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesYou may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation |
Unless instructed to delete them
The storage can't handle the set and leaves empty contents
Instead of the project
It was just weird. Just explicitly check for empty. Honestly, we can probably revert the `only_if_empty` kwarg too, but that can be its own PR.
00a5709
to
8edc5a0
Compare
It does though. Something on the internet must just not have updated yet. I'll try again in a few minutes. |
0.1.15 is the most recent version showing on anaconda.org, but 0.1.14 is the most recent on my local machine using |
The "rerun failed jobs" button for the CI report and "rerun job" for individual jobs are both doing nothing, so I am going to close and reopen... |
There's some sort of pathing issue going on in the CI environment compared to my local machine; everything fails at the first node instantiation with errors of the form: ======================================================================
ERROR: test_run_data_tree (test_node.TestNode)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'start'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/work/pyiron_workflow/pyiron_workflow/tests/unit/test_node.py", line 64, in setUp
self.n1 = ANode("start", x=0)
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/has_post.py", line 16, in __call__
post(instance, *args, **kwargs)
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/node.py", line 362, in __post__
do_load = self.storage.has_contents
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 101, in has_contents
has_contents = self._tinybase_storage_is_there or self._h5io_storage_is_there
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 150, in _tinybase_storage_is_there
if os.path.isfile(self._tinybase_storage_file_path):
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 133, in _tinybase_storage_file_path
self.node.graph_root.working_directory.path
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/node.py", line 788, in working_directory
self._working_directory = DirectoryObject(self.label)
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/files.py", line 41, in __init__
self.create()
File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/files.py", line 44, in create
self.path.mkdir(parents=True, exist_ok=True)
File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1180, in mkdir
self.mkdir(mode, parents=False, exist_ok=exist_ok)
File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'start' |
Ok, the tests were failing on my local machine as well. It was an issue with the It looks like I'll also need to patch all the storage to run only on >=3.11 after all. |
It was just not showing up because the test name was doubled up
And test for it, hopefully, we'll find out on the CI I guess
Otherwise we check to see if storage is there and get an error
…h_jobs_in_storage
Out of kindness
Make NodeJob compliant with the storage interface
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesYou may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation |
@samwaseda, @JNmpi -- saving and loading is working! Needs this branch, and the
tinydata
branch on contrib.@pmrv, before I dig deep and start fleshing out tests and examples, I would love to get some high-level feedback from you about the architecture and how I'm (probably mis)using
tinybase
storage tools.Design
Uses
tinybase.storage.H5ioStorage
to save and load nodes (individual function nodes, macros, and workflows). Is written using this branch in contrib, so it is not surprising when tests fail!Node
(and some children) andDataChannel
define methods from storing their data in/restoring their data from anH5ioStorage
instance, although theDataChannel
The expectation is that loading happens from an already-live instance of a
Node
; that means that it has already instantiated its childDataChannel
objects and they are live too -- so we are really just populating attributes of live objects.Information about channel connections is stored at the
Composite
level, and these are recreated after all children are available.Macro
s instantiate their children at their own instantiation time, butWorkflow
s need to look at the stored data (package identifier and node class), re-register packages, and re-instantiate their child nodes. Then these nodes can be restored from HDF as usual.We delay interaction with
pyiron_contrib
until the last possible moment, as the import still takes forever.Features
save()
andload()
interface available at theNode
level -- works for function nodes, macros, and workflowsIt works like this:
Then we can come back later, in a fresh python interpreter (with the same
cwd
so we can find the save file), and do this:Note that at load-time we didn't need to register any packages -- everything we needed was found in the stored data and registered automatically.
Shortcomings
Children ofThis is a non-issue; if you define a newComposite
and children ofFunction
need to havemacro_creator
andnode_function
defined as class methods, not passed at instantiation, or I'm 99% sure saving/loading will break. This is not too bad, since all the nodes created by the decorators fulfill this automatically, and that's both our preferred and most common way of doing things.Node
instance and save it with the same name, indeed, you'll get garbage -- IMO this is just a special case of changing the source code between saving and loading. If you set up the same node instance both times, it saves and loads just fine because it has the necessary function again anyhow.Workflow
's child nodes need to be created from a package (wf.create....
) -- this is so they get apackage_identifier
and the loading workflow can re-instantiate them. This is not too much of a pain right now since our package tracking is pretty sloppy -- a package is just a module -- so you just need to move nodes from your jupyter notebook over to a .py file somewhere in your python path and register them like usual prior to saving.Macro
instance in your notebook (e.g. by changing its internal connections, or replacing an internal node), you'll still be able to save it, but loading it will probably just breakTODO:
Stack a PR: Pull inDo it later, after contrib 949 is mergedtinybase
from contribStack a PR: Rebase classes ontoDo it later, we have an issue placeholder Storage: UseStorable
(at leastNode
-- maybeChannelData
still gets a different solution) so we can use_re/store
directly instead of the bespoketo/from_storage
tinybase.storage.Storable
directly #162Wait forGit-copy it to this repo instead (and later, per above)tinybase
tools used to be available on condaCloses #153