Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For nodes derive from HasChannel, but node.channel raises and error #603

Open
pmrv opened this issue Feb 24, 2025 · 7 comments
Open

For nodes derive from HasChannel, but node.channel raises and error #603

pmrv opened this issue Feb 24, 2025 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@pmrv
Copy link
Contributor

pmrv commented Feb 24, 2025

I was trying to connect a for node with another node using the 'call-like' syntax sugar, but that triggers an error as below, though explicitly connecting inputs/outputs works fine.

MWE:

@Workflow.wrap.as_function_node
def Square(x: int) -> int:
    y = x**2
    return y

@Workflow.wrap.as_function_node
def Sum(xs: list[int]) -> int:
    s = sum(xs)
    return s

sf = Square().for_node(iter_on='x', output_as_dataframe=False)
ss = Sum(sf)

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/software/pyiron_workflow/pyiron_workflow/mixin/semantics.py:244, in SemanticParent.__getattr__(self, key)
    243 try:
--> 244     return self._children[key]
    245 except KeyError as key_error:
    246     # Raise an attribute error from getattr to make sure hasattr works well!

File /cmmc/ptmp/pyironhb/mambaforge/envs/pyiron_mpie_cmti_2025-02-03/lib/python3.11/site-packages/bidict/_base.py:524, in BidictBase.__getitem__(self, key)
    523 """*x.__getitem__(key) ⟺ x[key]*"""
--> 524 return self._fwdm[key]

KeyError: 'channel'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[624], line 12
      9     return s
     11 sf = Square().for_node(iter_on='x', output_as_dataframe=False)
---> 12 ss = Sum(sf)

File ~/software/pyiron_workflow/pyiron_workflow/node.py:312, in Node.__init__(self, label, parent, delete_existing_savefiles, autoload, autorun, checkpoint, *args, **kwargs)
    309 # A place for power-users to bypass node-injection
    311 self._setup_node()
--> 312 self._after_node_setup(
    313     *args,
    314     delete_existing_savefiles=delete_existing_savefiles,
    315     autoload=autoload,
    316     autorun=autorun,
    317     **kwargs,
    318 )

File ~/software/pyiron_workflow/pyiron_workflow/node.py:357, in Node._after_node_setup(self, delete_existing_savefiles, autoload, autorun, *args, **kwargs)
    354             self.load(backend=autoload)
    355             break
--> 357 self.set_input_values(*args, **kwargs)
    359 if autorun:
    360     with contextlib.suppress(ReadinessError):

File ~/software/pyiron_workflow/pyiron_workflow/io.py:447, in HasIO.set_input_values(self, *args, **kwargs)
    444 self._ensure_all_input_keys_present(kwargs.keys(), self.inputs.labels)
    446 for k, v in kwargs.items():
--> 447     self.inputs[k] = v

File ~/software/pyiron_workflow/pyiron_workflow/io.py:118, in IO.__setitem__(self, key, value)
    117 def __setitem__(self, key: str, value: Any) -> None:
--> 118     self.__setattr__(key, value)

File ~/software/pyiron_workflow/pyiron_workflow/io.py:94, in IO.__setattr__(self, key, value)
     92 def __setattr__(self, key: str, value: Any) -> None:
     93     if key in self.channel_dict:
---> 94         self._assign_value_to_existing_channel(self.channel_dict[key], value)
     95     elif isinstance(value, self._channel_class):
     96         if key != value.label:

File ~/software/pyiron_workflow/pyiron_workflow/io.py:110, in IO._assign_value_to_existing_channel(self, channel, value)
    108 def _assign_value_to_existing_channel(self, channel: OwnedType, value: Any) -> None:
    109     if isinstance(value, HasChannel):
--> 110         channel.connect(value.channel)
    111     else:
    112         self._assign_a_non_channel_value(channel, value)

File ~/software/pyiron_workflow/pyiron_workflow/mixin/semantics.py:253, in SemanticParent.__getattr__(self, key)
    251 if len(matches) > 0:
    252     msg += f" Did you mean '{matches[0]}' and not '{key}'?"
--> 253 raise AttributeError(msg) from key_error

AttributeError: Could not find attribute 'channel' on ForSquareIterXListOut (ForSquareIterXListOut) or among its children (dict_keys(['x'])).
@pmrv pmrv added the bug Something isn't working label Feb 24, 2025
@liamhuber liamhuber self-assigned this Feb 24, 2025
@liamhuber
Copy link
Member

Ok, so this is not a case of a bug in execution or performance, but just a very uninformative error message.

There is nothing wrong with the "call-like" syntax here, but rather the "node-as-channel" syntax. In this case, it's because sf = Square().for_node(iter_on='x', output_as_dataframe=False) has two output channels and not one, and thus cannot unambiguously be used in place of an output channel. So in this sense, the error message is totally correct -- you try to get sf.channel in order to use its output as input, but that's not available. The for-loop nodes return enough data to construct their input+output tuples, so in this case you're getting two channels.

import pyiron_workflow as pwf

@pwf.as_function_node
def Square(x: int) -> int:
    y = x**2
    return y

sf_channels = Square.for_node(iter_on='x', output_as_dataframe=False)
print(sf_channels.outputs.labels)
>>> ['x', 'y']

On the other hand, if you left them packaged as a data channel, you'd be able to access its single-output without trouble:

sf_df = Square.for_node(iter_on='x', output_as_dataframe=True)
print(sf_df.channel.full_label)
>>> /ForSquareIterXDataOut.df

You can then leverage the entire node as though it were output, and you can even convert the dataframe column to a list entirely within the pyiron_workflow standard node ecosystem, it's just a pain in the butt:

@pwf.as_function_node
def Sum(xs: list[int]) -> int:
    s = sum(xs)
    return s

sf_df.inputs.x = [1, 2, 3]
xcol = pwf.standard_nodes.GetItem(sf_df, "y")
tolist = pwf.standard_nodes.GetAttr(xcol, "tolist")
df_col_as_list = pwf.standard_nodes.PureCall(tolist)
sum_from_dataframe = Sum(df_col_as_list)
print(sum_from_dataframe())
>>> 14

This is why I introduced output_as_dataframe, and False is very much my preferred value. This way you lose the "node-as-channel", but recognizing that each (looped) input and all output gets its own channel, it's still super easy to work with:

sf_channels.inputs.x = [1, 2, 3]
sum_from_channels = Sum(sf_channels.outputs.y)
print(sum_from_channels())

Note that for_node is a class method, so you don't need to first instantiate Square -- indeed, if you try to do that on a workflow it's possible you'll even get complaints. Here, without a parent, the instantiated node just sits around unused.

So, ultimately, I think the only problem here is that the error message is "I can't find 'channel'" instead of "You tried to get 'channel' when there's more than one output". Let me poke around and see if I can get it to be more explicit in this case.

@pmrv
Copy link
Contributor Author

pmrv commented Feb 24, 2025

sf_channels.inputs.x = [1, 2, 3]
sum_from_channels = Sum(sf_channels.outputs.y)
print(sum_from_channels())

Aha, so I understand the issue and then agree that throwing some kind of error when doing what I did. I kind of wish though that it didn't return the input again. In the dataframe case I see the utility, but when outputing just values it seems counterproductive. In the case I need the input and output together, I can easily wire the input to the for node to whatever consumer also needs its output, but case I do not need the input, the workflow just leaked a bunch of memory. In a way I'd prefer a map to a for.

On a more generalized direction: This opened up the design question to me whether its better to stitch together a number of for nodes (which is where this example originally came from) or to stitch the individual nodes and wrap the collection into a single for node. The first was more natural to me, especially in terms of check pointing, but maybe the latter is more ergonomic with workflows.

Note that for_node is a class method, so you don't need to first instantiate Square -- indeed, if you try to do that on a workflow it's possible you'll even get complaints. Here, without a parent, the instantiated node just sits around unused.

I figured that later in the afternoon after reading the code a bit, though in the beginning it felt somewhat natural for me to instantiate it first. It may have just being me playing with the singular node first. Tangentially, for_node is a class method and takes a type as input, but returns an instantiated node. This is causing me a bit of headache, because know I don't know define "canonical" for nodes in my modules. I'd prefer to write nodes that act on a singular input and then export in my API also the "parallelized" version, but I cannot do something like

@Workflow.wrap.as_function_node
def MyNode(...):
  pass
MyNodeLoop = MyNode.for_node(iter_on=...)

because the second one is an instance, unless I misunderstood something somewhere. I could easily write the loop node by hand using the unwrapped function, but I'd be neater if I wouldn't need to.

So, ultimately, I think the only problem here is that the error message is "I can't find 'channel'" instead of "You tried to get 'channel' when there's more than one output". Let me poke around and see if I can get it to be more explicit in this case.

I'm not sure. HasChannel promises me that I can do sf.channel, but I can't without triggering the error above. So ForNode does not implement the interface. If I understand your explanation then whether this works depends on the number of output channels (1 or >1). It sounds like this is not something that python's "type system" can understand, so I see it's tricky, but maybe then the interface is not quite right.

@liamhuber
Copy link
Member

I kind of wish though that it didn't return the input again. In the dataframe case I see the utility, but when outputing just values it seems counterproductive. In the case I need the input and output together, I can easily wire the input to the for node to whatever consumer also needs its output, but case I do not need the input, the workflow just leaked a bunch of memory.

In principle, I agree. I'm even ok moving there in the long run if we find the memory becomes a problem. In the meantime, it is not necessarily completely trivial to match inputs to outputs, so I think its convenient to give them to users all the time. Consider:

import pyiron_workflow as pwf

n = pwf.standard_nodes.Add.for_node(
    iter_on="obj",
    zip_on="other",
    obj=["a", "b", "c"],
    other=["A", "B", "C"],
    output_as_dataframe=False,
)
n()

This is causing me a bit of headache, because know I don't know define "canonical" for nodes in my modules. I'd prefer to write nodes that act on a singular input and then export in my API also the "parallelized" version, but I cannot do something like

We might be able to get a decorator working like that, but already you can just subclass For to accomplish what you want very easily. I see, however, that there is an inconsistency in the API where we make Macro and Function available, but not For. At any rate, here's an example overriding some class variables to make the same for-loop as above, just a 5-liner:

from typing import ClassVar

import pyiron_workflow as pwf
from pyiron_workflow.nodes.for_loop import For
from pyiron_workflow.nodes.static_io import StaticNode

class AddFor(For):
    _body_node_class: ClassVar[type[StaticNode]] = pwf.standard_nodes.Add
    _iter_on: ClassVar[tuple[str, ...]] = ("obj",)
    _zip_on: ClassVar[tuple[str, ...]] = ("other",)
    _output_as_dataframe: ClassVar[bool] = False

n = AddFor(
    obj=["a", "b", "c"],
    other=["A", "B", "C"],
)
n()

I'm not sure. HasChannel promises me that I can do sf.channel, but I can't without triggering the error above. So ForNode does not implement the interface. If I understand your explanation then whether this works depends on the number of output channels (1 or >1). It sounds like this is not something that python's "type system" can understand, so I see it's tricky, but maybe then the interface is not quite right.

That's a valid complaint. Your understanding is correct. To go into more detail, it's a little bit of an edge case, because For(StaticNode(Node(ExploitsSingleOutput(HasGenericChannel[OutputDataWithInjection]) does implement @property;def channel -- it's just that this property raises an AmbiguousOutputError(AttributeError) if there's more than one output. Because the exception inherits from AttributeError, that means hasattr(my_multiple_output_node, "channel") will indeed evaluate False. So .channelis in the attribute list, but it doesn'thasattr` -- does it "have a channel" or not? I'd be happy with any refactoring that preserves the current test suite while resolving this tension, but I'm not going to prioritize figuring one out.

Let me poke around and see if I can get it to be more explicit in this case.

I did, and it's a pain. I couldn't manage to get the ambiguous output exception into the stack trace for composites with multiple outputs without causing other trouble. Not a good sign 👎 I don't think I have time to work on it more this week.

@liamhuber liamhuber added documentation Improvements or additions to documentation and removed bug Something isn't working labels Feb 24, 2025
@liamhuber
Copy link
Member

liamhuber commented Feb 24, 2025

Let me poke around and see if I can get it to be more explicit in this case.

I did, and it's a pain. I couldn't manage to get the ambiguous output exception into the stack trace for composites with multiple outputs without causing other trouble. Not a good sign 👎 I don't think I have time to work on it more this week.

@pmrv, I'd like to keep the issue open while the error message sucks, but IMO it's a documentation issue rather than a bug -- behaviour is basically what we want, we're just not doing a good job describing to the user what happened. I switched the issue label accordingly, but lmk if you object.

@pmrv
Copy link
Contributor Author

pmrv commented Feb 25, 2025

Let me poke around and see if I can get it to be more explicit in this case.

I did, and it's a pain. I couldn't manage to get the ambiguous output exception into the stack trace for composites with multiple outputs without causing other trouble. Not a good sign 👎 I don't think I have time to work on it more this week.

@pmrv, I'd like to keep the issue open while the error message sucks, but IMO it's a documentation issue rather than a bug -- behaviour is basically what we want, we're just not doing a good job describing to the user what happened. I switched the issue label accordingly, but lmk if you object.

A-ok, I agree that the functionality works. I'd just like to be better informed by the error.

@pmrv
Copy link
Contributor Author

pmrv commented Feb 25, 2025

This is causing me a bit of headache, because know I don't know define "canonical" for nodes in my modules. I'd prefer to write nodes that act on a singular input and then export in my API also the "parallelized" version, but I cannot do something like

We might be able to get a decorator working like that, but already you can just subclass For to accomplish what you want very easily. I see, however, that there is an inconsistency in the API where we make Macro and Function available, but not For. At any rate, here's an example overriding some class variables to make the same for-loop as above, just a 5-liner:

I had seen this class, but I honestly was a bit scared to derive from something with a lot of underscores. Your snippet worked perfectly fine for me though, so it's ok for now.

@pmrv
Copy link
Contributor Author

pmrv commented Feb 25, 2025

I kind of wish though that it didn't return the input again. In the dataframe case I see the utility, but when outputing just values it seems counterproductive. In the case I need the input and output together, I can easily wire the input to the for node to whatever consumer also needs its output, but case I do not need the input, the workflow just leaked a bunch of memory.

In principle, I agree. I'm even ok moving there in the long run if we find the memory becomes a problem. In the meantime, it is not necessarily completely trivial to match inputs to outputs, so I think its convenient to give them to users all the time. Consider:

Right, when taking the product of inputs it becomes tricky. Ok for now, but maybe offering a dedicated map in the future would be nice UX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants