Skip to content

Commit

Permalink
[patch] 0.9.5 (#419)
Browse files Browse the repository at this point in the history
* [patch] Pickle storage backend (#408)

* Refactor storage test

* Remove empty line

* Rename workflow in tests

With the introduction of a pickle backend, this test wound up failing. I haven't been able to work out exactly why, but since pickle is much faster my guess is that something in the test parallelization gets in a race condition with deleting a saved workflow from another test. Simply renaming it here solved the issue, which is an innocuous change.

* Introduce `pickle` as a backend for saving

* Fix root cause of storage conflict

Ok, the reason workflow storage tests were failing after introducing pickle was that the original version of the test had the wrong order of the try/finally scope and the for-loop scope. I got away with it earlier because of interplay between the outer-most member of the loop and the default storage backend. Actually fixing the problem is as simple as ensuring the "finally delete" clause happens after _each_ loop step. This does that and reverts the renaming of the workflow.

* Again, correctly order try/finally and for-loops

* [patch] `as_dataclass_node` pickle compatibility (#410)

* Refactor storage test

* Remove empty line

* Rename workflow in tests

With the introduction of a pickle backend, this test wound up failing. I haven't been able to work out exactly why, but since pickle is much faster my guess is that something in the test parallelization gets in a race condition with deleting a saved workflow from another test. Simply renaming it here solved the issue, which is an innocuous change.

* Introduce `pickle` as a backend for saving

* Fix root cause of storage conflict

Ok, the reason workflow storage tests were failing after introducing pickle was that the original version of the test had the wrong order of the try/finally scope and the for-loop scope. I got away with it earlier because of interplay between the outer-most member of the loop and the default storage backend. Actually fixing the problem is as simple as ensuring the "finally delete" clause happens after _each_ loop step. This does that and reverts the renaming of the workflow.

* Again, correctly order try/finally and for-loops

* Remove keyword argument from pure-decorator

You're only supposed to use it as a decorator to start with, so the kwarg was senseless

* Add factory import source

This is necessary (but not sufficient) to get `as_dataclass_node` decorated classes to pickle. The field name is a bit off compared to `Function` and `Macro`, as now we decorate a class definition instead of a function definition, but it's close.

* Bring the dataclass node in line with function and macro

* Leverage new pyiron_snippets.factory stuff to find the class

* Mangle the stored dataclass qualname so it can be found later

* Add tests

* Update docs examples to reflect new naming

* Update snippets dependency

* [dependabot skip] Update env file

* Use new pyiron_snippets syntax consistently

* Expose `as_dataclass_node` in the API

Now that it's pickling as well as anything else

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] Fall back on cloudpickle (#411)

* Refactor storage test

* Remove empty line

* Rename workflow in tests

With the introduction of a pickle backend, this test wound up failing. I haven't been able to work out exactly why, but since pickle is much faster my guess is that something in the test parallelization gets in a race condition with deleting a saved workflow from another test. Simply renaming it here solved the issue, which is an innocuous change.

* Introduce `pickle` as a backend for saving

* Fix root cause of storage conflict

Ok, the reason workflow storage tests were failing after introducing pickle was that the original version of the test had the wrong order of the try/finally scope and the for-loop scope. I got away with it earlier because of interplay between the outer-most member of the loop and the default storage backend. Actually fixing the problem is as simple as ensuring the "finally delete" clause happens after _each_ loop step. This does that and reverts the renaming of the workflow.

* Again, correctly order try/finally and for-loops

* Remove keyword argument from pure-decorator

You're only supposed to use it as a decorator to start with, so the kwarg was senseless

* Add factory import source

This is necessary (but not sufficient) to get `as_dataclass_node` decorated classes to pickle. The field name is a bit off compared to `Function` and `Macro`, as now we decorate a class definition instead of a function definition, but it's close.

* Bring the dataclass node in line with function and macro

* Leverage new pyiron_snippets.factory stuff to find the class

* Mangle the stored dataclass qualname so it can be found later

* Add tests

* Update docs examples to reflect new naming

* Update snippets dependency

* [dependabot skip] Update env file

* Use new pyiron_snippets syntax consistently

* Expose `as_dataclass_node` in the API

Now that it's pickling as well as anything else

* [patch] Fall back on cloudpickle

When the pickle backend fails

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* [patch] Make pickle the default storage backend (#412)

* Refactor storage test

* Remove empty line

* Rename workflow in tests

With the introduction of a pickle backend, this test wound up failing. I haven't been able to work out exactly why, but since pickle is much faster my guess is that something in the test parallelization gets in a race condition with deleting a saved workflow from another test. Simply renaming it here solved the issue, which is an innocuous change.

* Introduce `pickle` as a backend for saving

* Fix root cause of storage conflict

Ok, the reason workflow storage tests were failing after introducing pickle was that the original version of the test had the wrong order of the try/finally scope and the for-loop scope. I got away with it earlier because of interplay between the outer-most member of the loop and the default storage backend. Actually fixing the problem is as simple as ensuring the "finally delete" clause happens after _each_ loop step. This does that and reverts the renaming of the workflow.

* Again, correctly order try/finally and for-loops

* Remove keyword argument from pure-decorator

You're only supposed to use it as a decorator to start with, so the kwarg was senseless

* Add factory import source

This is necessary (but not sufficient) to get `as_dataclass_node` decorated classes to pickle. The field name is a bit off compared to `Function` and `Macro`, as now we decorate a class definition instead of a function definition, but it's close.

* Bring the dataclass node in line with function and macro

* Leverage new pyiron_snippets.factory stuff to find the class

* Mangle the stored dataclass qualname so it can be found later

* Add tests

* Update docs examples to reflect new naming

* Update snippets dependency

* [dependabot skip] Update env file

* Use new pyiron_snippets syntax consistently

* Expose `as_dataclass_node` in the API

Now that it's pickling as well as anything else

* [patch] Fall back on cloudpickle

When the pickle backend fails

* [minor] Make pickle the default storage backend

* Format black

* Fall back on loading any storage contents

Regardless of what the specified storage backend was.

* Format black

---------

Co-authored-by: pyiron-runner <[email protected]>

* Format black

* Update pyiron deps

* [dependabot skip] Update env file

---------

Co-authored-by: pyiron-runner <[email protected]>
  • Loading branch information
liamhuber and pyiron-runner authored Aug 13, 2024
1 parent 1857963 commit 3bd6241
Show file tree
Hide file tree
Showing 21 changed files with 312 additions and 75 deletions.
6 changes: 3 additions & 3 deletions .binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ dependencies:
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.10
- pyiron_contrib =0.1.17
- pyiron_snippets =0.1.3
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
- typeguard =4.3.0
6 changes: 3 additions & 3 deletions .ci_support/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ dependencies:
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.10
- pyiron_contrib =0.1.17
- pyiron_snippets =0.1.3
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
- typeguard =4.3.0
8 changes: 4 additions & 4 deletions .ci_support/lower_bound.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ dependencies:
- executorlib =0.0.1
- graphviz =9.0.0
- h5io =0.2.2
- h5io_browser =0.0.12
- h5io_browser =0.0.14
- pandas =2.2.0
- pyiron_base =0.8.3
- pyiron_contrib =0.1.16
- pyiron_snippets =0.1.0
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.0
- toposort =1.10
- typeguard =4.2.0
6 changes: 3 additions & 3 deletions docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ dependencies:
- h5io =0.2.4
- h5io_browser =0.0.16
- pandas =2.2.2
- pyiron_base =0.9.10
- pyiron_contrib =0.1.17
- pyiron_snippets =0.1.3
- pyiron_base =0.9.12
- pyiron_contrib =0.1.18
- pyiron_snippets =0.1.4
- python-graphviz =0.20.3
- toposort =1.10
- typeguard =4.3.0
9 changes: 6 additions & 3 deletions notebooks/deepdive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5521,15 +5521,18 @@
"- Also related, there is currently zero filtering of which data, or to which depth the graph gets stored -- it's all or nothing\n",
"- If the source code for nodes gets modified between saving and loading, weird stuff is likely to happen, and some of it may happen silently.\n",
"\n",
"Lastly, we currently use two backends: `tinybase.storage.H5ioStorage` and `h5io` directly. They have slightly different strengths:\n",
"- `\"h5io\"` (the default) \n",
"Lastly, we currently use three backends: `pickle`, ``tinybase.storage.H5ioStorage` and `h5io` directly. They have slightly different strengths:\n",
"- `\"h5io\"` \n",
" - Will let you save and load any nodes that defined by subclassing (this includes all nodes defined using the decorators)\n",
" - Will preserve changes to a macro (replace/add/remove/rewire)\n",
" - Has trouble with some data\n",
"- `\"tinybase\"`\n",
" - Requires all nodes to have been instantiated with the creator (`wf.create...`; this means moving node definitions to a `.py` file in your pythonpath and registering it as a node package -- not particularly difficult!)\n",
" - _Ignores_ changes to a macro (will crash nicely if the macro IO changed)\n",
" - Falls back to `pickle` for data failures, so can handle a wider variety of data IO objects"
" - Falls back to `pickle` for data failures, so can handle a wider variety of data IO objects\n",
"- `\"pickle\"`\n",
" - Can't handle unpickleable IO items, if that's a problem for your data\n",
" - Stored in byte format; not browsable and needs to be loaded to inspect it at all"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion pyiron_workflow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
from pyiron_workflow.logging import logger
from pyiron_workflow.nodes.macro import Macro, as_macro_node, macro_node
from pyiron_workflow.nodes.transform import (
# as_dataclass_node, # Not pickling nicely yet
as_dataclass_node,
dataclass_node,
inputs_to_dataframe,
inputs_to_dict,
Expand Down
105 changes: 100 additions & 5 deletions pyiron_workflow/mixin/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@
from abc import ABC, abstractmethod
from importlib import import_module
import os
import pickle
import sys
from typing import Optional

import cloudpickle
import h5io
from pyiron_snippets.files import FileObject, DirectoryObject

Expand Down Expand Up @@ -84,6 +86,76 @@ def _delete(self):
"""Remove an existing save-file for this backend"""


class PickleStorage(StorageInterface):

_PICKLE_STORAGE_FILE_NAME = "pickle.pckl"
_CLOUDPICKLE_STORAGE_FILE_NAME = "cloudpickle.cpckl"

def __init__(self, owner: HasPickleStorage):
super().__init__(owner=owner)

@property
def owner(self) -> HasPickleStorage:
return self._owner

def _save(self):
try:
with open(self._pickle_storage_file_path, "wb") as file:
pickle.dump(self.owner, file)
except Exception:
self._delete()
with open(self._cloudpickle_storage_file_path, "wb") as file:
cloudpickle.dump(self.owner, file)

def _load(self):
if self._has_pickle_contents:
with open(self._pickle_storage_file_path, "rb") as file:
inst = pickle.load(file)
elif self._has_cloudpickle_contents:
with open(self._cloudpickle_storage_file_path, "rb") as file:
inst = cloudpickle.load(file)

if inst.__class__ != self.owner.__class__:
raise TypeError(
f"{self.owner.label} cannot load, as it has type "
f"{self.owner.__class__.__name__}, but the saved node has type "
f"{inst.__class__.__name__}"
)
self.owner.__setstate__(inst.__getstate__())

def _delete_file(self, file: str):
FileObject(file, self.owner.storage_directory).delete()

def _delete(self):
if self._has_pickle_contents:
self._delete_file(self._PICKLE_STORAGE_FILE_NAME)
elif self._has_cloudpickle_contents:
self._delete_file(self._CLOUDPICKLE_STORAGE_FILE_NAME)

def _storage_path(self, file: str):
return str((self.owner.storage_directory.path / file).resolve())

@property
def _pickle_storage_file_path(self) -> str:
return self._storage_path(self._PICKLE_STORAGE_FILE_NAME)

@property
def _cloudpickle_storage_file_path(self) -> str:
return self._storage_path(self._CLOUDPICKLE_STORAGE_FILE_NAME)

@property
def _has_contents(self) -> bool:
return self._has_pickle_contents or self._has_cloudpickle_contents

@property
def _has_pickle_contents(self) -> bool:
return os.path.isfile(self._pickle_storage_file_path)

@property
def _has_cloudpickle_contents(self) -> bool:
return os.path.isfile(self._cloudpickle_storage_file_path)


class H5ioStorage(StorageInterface):

_H5IO_STORAGE_FILE_NAME = "h5io.h5"
Expand Down Expand Up @@ -278,7 +350,15 @@ def load(self):
Raises:
TypeError: when the saved node has a different class name.
"""
self.storage.load()
if self.storage.has_contents:
self.storage.load()
else:
# Check for saved content using any other backend
for backend in self.allowed_backends():
interface = self._storage_interfaces()[backend](self)
if interface.has_contents:
interface.load()
break

save.__doc__ += _save_load_warnings

Expand Down Expand Up @@ -333,6 +413,13 @@ def storage(self) -> StorageInterface:
raise ValueError(f"{self.label} does not have a storage backend set")
return self._storage_interfaces()[self.storage_backend](self)

@property
def any_storage_has_contents(self):
return any(
self._storage_interfaces()[backend](self).has_contents
for backend in self.allowed_backends()
)

@property
def import_ready(self) -> bool:
"""
Expand Down Expand Up @@ -365,6 +452,18 @@ def report_import_readiness(self, tabs=0, report_so_far=""):
)


class HasPickleStorage(HasStorage, ABC):
@classmethod
def _storage_interfaces(cls):
interfaces = super(HasPickleStorage, cls)._storage_interfaces()
interfaces["pickle"] = PickleStorage
return interfaces

@classmethod
def default_backend(cls):
return "pickle"


class HasH5ioStorage(HasStorage, ABC):
@classmethod
def _storage_interfaces(cls):
Expand All @@ -387,7 +486,3 @@ def to_storage(self, storage: TinybaseStorage):
@abstractmethod
def from_storage(self, storage: TinybaseStorage):
pass

@classmethod
def default_backend(cls):
return "tinybase"
33 changes: 20 additions & 13 deletions pyiron_workflow/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,11 @@
from pyiron_workflow.mixin.run import Runnable, ReadinessError
from pyiron_workflow.mixin.semantics import Semantic
from pyiron_workflow.mixin.single_output import ExploitsSingleOutput
from pyiron_workflow.mixin.storage import HasH5ioStorage, HasTinybaseStorage
from pyiron_workflow.mixin.storage import (
HasH5ioStorage,
HasTinybaseStorage,
HasPickleStorage,
)
from pyiron_workflow.topology import (
get_nodes_in_data_tree,
set_run_connections_according_to_linear_dag,
Expand All @@ -48,6 +52,7 @@ class Node(
HasWorkingDirectory,
HasH5ioStorage,
HasTinybaseStorage,
HasPickleStorage,
ABC,
):
"""
Expand Down Expand Up @@ -138,7 +143,8 @@ class Node(
- As long as you haven't put anything unpickleable on them, or defined them in
an unpicklable place (e.g. in the `<locals>` of another function), you can
simple (un)pickle nodes. There is no save/load interface for this right
now, just import pickle and do it.
now, just import pickle and do it. The "pickle" backend to the `Node.save`
method will fall back on `cloudpickle` as needed to overcome this.
- Saving is triggered manually, or by setting a flag to save after the nodes
runs.
- At the end of instantiation, nodes will load automatically if they find saved
Expand All @@ -160,11 +166,11 @@ class Node(
your graph this could be expensive in terms of storage space and/or time.
- [ALPHA ISSUE] Similarly, there is no way to save only part of a graph; only
the entire graph may be saved at once.
- [ALPHA ISSUE] There are two possible back-ends for saving: one leaning on
- [ALPHA ISSUE] There are three possible back-ends for saving: one leaning on
`tinybase.storage.GenericStorage` (in practice,
`H5ioStorage(GenericStorage)`), that is the default, and the other that
uses the `h5io` module directly. The backend used is always the one on the
graph root.
`H5ioStorage(GenericStorage)`), and the other that uses the `h5io` module
directly. The third (default) option is to use `(cloud)pickle`. The backend
used is always the one on the graph root.
- [ALPHA ISSUE] The `h5io` backend is deprecated -- it can't handle custom
reconstructors (i.e. when `__reduce__` returns a tuple with some
non-standard callable as its first entry), and basically all our nodes do
Expand All @@ -191,8 +197,9 @@ class Node(
requirement is as simple as moving all the desired nodes off to a `.py`
file, registering it, and building the composite from there.
- [ALPHA ISSUE] Restrictions to macros:
- For the `h5io` backend: there are none; if a macro is modified, saved,
and reloaded, the modifications will be reflected in the loaded state.
- For the `h5io` and `pickle` backends: there are none; if a macro is
modified, saved, and reloaded, the modifications will be reflected in
the loaded state.
Note there is a little bit of danger here, as the macro class still
corresponds to the un-modified macro class.
- For the `tinybase` backend: the macro will re-instantiate its original
Expand Down Expand Up @@ -251,9 +258,9 @@ class Node(
received output from this call. (Default is False.)
save_after_run (bool): Whether to trigger a save after each run of the node
(currently causes the entire graph to save). (Default is False.)
storage_backend (Literal["h5io" | "tinybase"] | None): The flag for the the
backend to use for saving and loading; for nodes in a graph the value on
the root node is always used.
storage_backend (Literal["h5io" | "tinybase", "pickle"] | None): The flag for
the backend to use for saving and loading; for nodes in a graph the value
on the root node is always used.
signals (pyiron_workflow.io.Signals): A container for input and output
signals, which are channels for controlling execution flow. By default, has
a :attr:`signals.inputs.run` channel which has a callback to the :meth:`run` method
Expand Down Expand Up @@ -324,7 +331,7 @@ def __init__(
parent: Optional[Composite] = None,
overwrite_save: bool = False,
run_after_init: bool = False,
storage_backend: Literal["h5io", "tinybase"] | None = "h5io",
storage_backend: Literal["h5io", "tinybase", "pickle"] | None = "pickle",
save_after_run: bool = False,
**kwargs,
):
Expand Down Expand Up @@ -377,7 +384,7 @@ def _after_node_setup(
self.delete_storage()
do_load = False
else:
do_load = sys.version_info >= (3, 11) and self.storage.has_contents
do_load = sys.version_info >= (3, 11) and self.any_storage_has_contents

if do_load and run_after_init:
raise ValueError(
Expand Down
2 changes: 1 addition & 1 deletion pyiron_workflow/nodes/composite.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def __init__(
parent: Optional[Composite] = None,
overwrite_save: bool = False,
run_after_init: bool = False,
storage_backend: Optional[Literal["h5io", "tinybase"]] = None,
storage_backend: Optional[Literal["h5io", "tinybase", "pickle"]] = None,
save_after_run: bool = False,
strict_naming: bool = True,
**kwargs,
Expand Down
2 changes: 1 addition & 1 deletion pyiron_workflow/nodes/for_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ def __init__(
parent: Optional[Composite] = None,
overwrite_save: bool = False,
run_after_init: bool = False,
storage_backend: Optional[Literal["h5io", "tinybase"]] = None,
storage_backend: Optional[Literal["h5io", "tinybase", "pickle"]] = None,
save_after_run: bool = False,
strict_naming: bool = True,
body_node_executor: Optional[Executor] = None,
Expand Down
5 changes: 4 additions & 1 deletion pyiron_workflow/nodes/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,10 @@ def decorator(node_function):
factory_made = function_node_factory(
node_function, validate_output_labels, use_cache, *output_labels
)
factory_made._class_returns_from_decorated_function = node_function
factory_made._reduce_imports_as = (
node_function.__module__,
node_function.__qualname__,
)
factory_made.preview_io()
return factory_made

Expand Down
7 changes: 5 additions & 2 deletions pyiron_workflow/nodes/macro.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ def _scrape_output_labels(cls):
return scraped_labels

def _prepopulate_ui_nodes_from_graph_creator_signature(
self, storage_backend: Literal["h5io", "tinybase"]
self, storage_backend: Literal["h5io", "tinybase", "pickle"]
):
ui_nodes = []
for label, (type_hint, default) in self.preview_inputs().items():
Expand Down Expand Up @@ -534,7 +534,10 @@ def decorator(graph_creator):
factory_made = macro_node_factory(
graph_creator, validate_output_labels, use_cache, *output_labels
)
factory_made._class_returns_from_decorated_function = graph_creator
factory_made._reduce_imports_as = (
graph_creator.__module__,
graph_creator.__qualname__,
)
factory_made.preview_io()
return factory_made

Expand Down
2 changes: 1 addition & 1 deletion pyiron_workflow/nodes/static_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def __init__(
parent: Optional[Composite] = None,
overwrite_save: bool = False,
run_after_init: bool = False,
storage_backend: Optional[Literal["h5io", "tinybase"]] = None,
storage_backend: Optional[Literal["h5io", "tinybase", "pickle"]] = None,
save_after_run: bool = False,
**kwargs,
):
Expand Down
Loading

0 comments on commit 3bd6241

Please sign in to comment.