Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate storage #455

Open
samwaseda opened this issue Sep 16, 2024 · 7 comments
Open

Separate storage #455

samwaseda opened this issue Sep 16, 2024 · 7 comments

Comments

@samwaseda
Copy link
Member

It looks to me like it requires only the path for storage.py. Do you think it makes sense to separate it to a separate module?

@samwaseda samwaseda added the enhancement New feature or request label Sep 16, 2024
@liamhuber
Copy link
Member

I don't have any objection to breaking the interface out somewhere else (and if you did I wouldn't object to the commit history credit of doing it with git-copy). I would have the following questions/comments:

  • Where to? For what purpose? Right now we only interface with (cloud)pickle, so is there any point in abstracting so early? I do see the benefit of something with the has_saved_content, delete, etc. interfaces -- and I'm glad you like them too 😄 -- I would just rather avoid making a new repo for the sake of it when there's only one application point.
  • It is not just the filepath -- there is also node-specific stuff around getting a default filename. This doesn't mean we can't pull it out, just a heads-up that even the ABC would need some modification to be general
  • I'm not convinced we could get rid of this class -- I think we still want a pyiron_workflow storage interface that is guaranteeing (at least type hinting) taking and returning Node instances (rather than saving/loading arbitrary objects). If we pull out a generic interface, the code here could of course be tightened up by inheriting from and modifying that generic piece as needed.
  • We need to stay alert when abstracting to how general things are. E.g. for h5 storage I imagine we are going to want not just a file path, but a storage path internal to the h5 file. For nodes this is pretty straightforward as I've made sure that the node's semantic path is perfectly aligned with its __getstate__ path, so I'm optimistic that we're going to be able to leverage the node object directly and leave that path argument blank most of the time. Will other objects be similarly friendly? So far the only "future-proofing" I've done in the interface is to allow **kwargs in a number of places so that child interfaces can specify extra stuff.

p.s. I'm going to remove "enhancement" because even if we pulled it out it's just a refactor, not new capability or user-interface.

@liamhuber liamhuber removed the enhancement New feature or request label Sep 16, 2024
@liamhuber
Copy link
Member

I guess it doesn't need to be a totally different repo, as an ABC of this style could go into pyiron_snippets. The pickle guy is already not welcome there by virtue of cloudpickle though...

@samwaseda
Copy link
Member Author

My suggestion comes from the opposite perspective: I think workflow managers should contain only workflow managing capabilities. So however small the files are, I still think they should be separated. In addition, it's not like we all know what's where and what each class does in pyiron_workflow, so for the development efforts it's good to make it easier for people to know how to take care of different parts.

In the worst case it can also be within pyiron_workflow, but still I would more cleanly separate it.

And more from my personal context: I was taking a look at the file object that we talked about, and this one looked to me indeed like what could be the more useful file object.

("enhancement" was there automatically)

@liamhuber
Copy link
Member

I think workflow managers should contain only workflow managing capabilities.

With the caveat that saving and loading the workflow is part of managing it, I agree. I think it's highly likely a storage module will exist here, but I am happy to make it as thin as possible. At minimum that means that we have the current situation that pyiron_workflow.storage depends on rather than implementing (cloud)pickle, but pulling out more of the abstract interface to back-ends sounds good too.

Where do you want to move it to?

@samwaseda
Copy link
Member Author

For now I'm ok with keeping it within pyiron_workflow, but more like a file object that doesn't depend on nodes.

@samwaseda
Copy link
Member Author

I'll try to look into it myself so just leave this open for a few days

@liamhuber
Copy link
Member

For now I'm ok with keeping it within pyiron_workflow, but more like a file object that doesn't depend on nodes.

👍

There's some fun stuff here like the multiple backends, etc, so I can definitely imagine refactoring so we have a base interface class that behaves like def save(self, obj: typing.Any, filename: str | Path | None = None, **kwargs):. I like that and don't object to it living here or elsewhere. I would still want a node-specific child class, e.g. for things like this:

elif node is not None and filename is None:
return node.as_path() / self.__class__.__name__.lower()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants