pipe_input target #1173

jernejfrank · 2024-10-09T12:19:40Z

Addresses #1161.

Default behaviour reverts back to first parameter chaining for backwards compatibility.

Changes

Corrected minor bug for Applicable (passing through target when using other methods)
Added user facing API on_input
Added on_input to pipe_input for global setting

How I tested this

Unit tests

Notes

There are two differences to on_output:

Since we do not use NodeTransformer we can have a mixture of global and local on_input settings. I made it so that the local (on the level of Applicable) adds to the global one (on the level of pipe_input). The other option would be to allow the local to override the global one. I also added a safeguard that if on_input is used, all transforms need to have a target to have it clear.
The namespace convention needs slight adjustment in case we apply pipe_input to multiple parameters. Since each node is chained through the parameter name, I have now that in case the same transform is applied to two parameters, we create two nodes with the same namespace but add to them "_param_name" so that we don't have problems with same-node names in the DAG.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 5441199 in 33 seconds

More details

Looked at 838 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. hamilton/function_modifiers/macros.py:920

Draft comment:
Ensure self.target is not None before calling extend on it. This can lead to a TypeError if self.target is None. Consider initializing self.target as an empty list if it's None.
Reason this comment was not posted:
Comment did not seem useful.

2. hamilton/function_modifiers/macros.py:885

Draft comment:
Initialize self.target as an empty list if on_input is None to avoid issues when extending it later.
Reason this comment was not posted:
Marked as duplicate.

3. hamilton/function_modifiers/macros.py:939

Draft comment:
Explicitly check if self.namespace is a string before using it in _resolve_namespace method for clarity and to avoid potential issues.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The current implementation of _resolve_namespace assumes that if self.namespace is not ... or None, it is a string. The suggestion to explicitly check if it is a string could add clarity and prevent potential issues if self.namespace is accidentally set to a non-string value. However, the current logic seems to handle the expected cases, and the suggestion might be considered overly cautious unless there is a known issue with self.namespace being set incorrectly.
I might be overthinking the need for an explicit string check. The current logic seems to handle the expected cases, and the suggestion might be unnecessary unless there is a specific reason to expect self.namespace could be set to a non-string value.
Explicitly checking for a string could prevent future bugs if self.namespace is accidentally set to an unexpected type, but it might be unnecessary if the current logic is sufficient.
The comment suggests a precautionary measure that might not be necessary given the current logic. However, it could prevent potential issues if self.namespace is set incorrectly. Consider keeping the comment if there's a history of such issues.

4. hamilton/function_modifiers/macros.py:857

Draft comment:
Please add documentation for the new on_input functionality in the pipe_input decorator to the Sphinx documentation under docs/. This will help users understand how to use this feature effectively.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about documentation, which is not directly related to the code changes in the diff. The rules specify not to ask for updates to the PR description or documentation unless it's part of the code changes. Since the comment is about documentation outside the code changes, it should be removed.
I might be overlooking the importance of having external documentation for new features, but the rules are clear about not commenting on documentation unless it's part of the code changes.
The rules are strict about not commenting on documentation unless it's part of the code changes, so the comment should be removed.
Remove the comment as it pertains to documentation outside the scope of the code changes in the diff.

Workflow ID: wflow_iN66poHVEvrsbEW2

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

elijahbenizzy

Nice! Left some thoughts -- I think this is valuable (E.G. the first parameter is a bit of an awkward assumption), but we might want to restrict the API a bit. Also some naming/consistency stuff.

elijahbenizzy · 2024-10-09T16:24:35Z

hamilton/function_modifiers/macros.py

    """

    def __init__(
        self,
        *transforms: Applicable,
        namespace: NamespaceType = ...,
+        on_input: base.TargetType = None,


So I'm not sure that on_input makes sense for anything but a single string, and the semantics aren't quite the same. Some thoughts:

By default, this applies to the first parameter

Does ... apply to everything?

IMO you should restrict it more -- either a string, or default (which is the first parameter). Also, @load_from (which is similar) has the parameter named target. But that's confusing, and probably not the best (maybe it should be called on_input there?. Thoughts on naming?

So I'm not sure that on_input makes sense for anything but a single string, and the semantics aren't quite the same. Some thoughts:

By default, this applies to the first parameter

Yes, I left the default to chain the first parameter and keep the namespace as is for backwards compatibility 👍 .

Does ... apply to everything?

I think we can add ..., but you could also just use the global on_input and pass in a list with the parameters.

IMO you should restrict it more -- either a string, or default (which is the first parameter). Also, @load_from (which is similar) has the parameter named target. But that's confusing, and probably not the best (maybe it should be called on_input there?. Thoughts on naming?

Ah okay, so the idea I had in mind is to leave flexibility that you can run pipe_input on more than one parameter: 3 different cases specifically

You can set a global on_input and it will apply it to those input arguments, like if we want to run it on the second argument w-->_add_one --> _add_two --> 2nd arg of foo

def _add_one(v: int) -> int: return v + 1 def _add_two(v: int) -> int: return v + 2 @pipe_input( step(_add_one).named("a"), step(_add_two).named("b"), on_input="w", namespace="global", ) def fool(v: int, w: int) -> int: return w

You can set a local on_input and choose which transform runs on which input argument, like v-->_add_one --> 1st arg of foo and w-->_add_two --> 2nd arg of foo.

def _add_one(v: int) -> int: return v + 1 def _add_two(v: int) -> int: return v + 2 @pipe_input( step(_add_one).on_input("v").named("a"), step(_add_two).on_input("w").named("b"), namespace="local", ) def fool(v: int, w: int) -> int: return v+w

You can set have a mixture on_input for global and local. The local adds specific transforms on top of what the global has. Here the whole pipeline gets applied to v, but only _add_two to w

def _add_one(v: int) -> int: return v + 1 def _add_two(v: int) -> int: return v + 2 @pipe_input( step(_add_one).named("a"), step(_add_two).on_input("w").named("b"), on_input="v" namespace="local", ) def fool(v: int, w: int) -> int: return v+w

The None defaults to first parameter and if someone sets on_input but not all the steps in pipe_input get hit, we error out that is unclear what the user wants and he either needs to specify locally or globally or omit.

Thanks @jernejfrank -- this makes a lot of sense.

To sum up the offline conversation we had, we decided to restrict it to a single input (for now), knowing that we can turn it on for more. The reasoning is to reduce supported surface area until we get a request for this feature!

elijahbenizzy · 2024-10-09T16:25:04Z

hamilton/function_modifiers/macros.py

        self.chain = _chain
        self.namespace = namespace

+        if isinstance(on_input, str):  # have to do extra since strings are collections in python
+            self.target = [on_input]
+        elif isinstance(on_input, Collection):


What's the use-case for a collection here? It's interesting, but feels a bit more complex than necessary.

The collection was to apply it for a subset of params, here is an example of globally, but it also works locally:
Lets say we only want to pipe_input first two params

def _add_one(v: int) -> int: return v + 1 def _add_two(v: int) -> int: return v + 2 @pipe_input( step(_add_one).named("a"), step(_add_two).named("b"), on_input=["v","w"], namespace="global", ) def foo(v: int, w: int, z:str, .... kwargs) -> int: return w

elijahbenizzy · 2024-10-09T16:26:04Z

hamilton/function_modifiers/macros.py

+        total_nodes = []
+        total_rename_maps = {}
+
+        for param in parameters_transforms_map:


This logic is getting complex -- we should probably break it out into a separate function and add a unit test or two for it...

jernejfrank · 2024-10-10T22:00:07Z

Nice! Left some thoughts -- I think this is valuable (E.G. the first parameter is a bit of an awkward assumption), but we might want to restrict the API a bit. Also some naming/consistency stuff.

The API expanded substantially to leave as much flexibility in there as possible. It also is a complete dual to pipe_output in the sense that you can target multiple parameters (like we do end nodes) and can allocate transforms individually between them -- so you could use them interchangeably / becomes a personal preference if you rather chain on output nodes or input parameter.

But I'm happy to constrain it more, like @elijahbenizzy suggested, to only allow only a single parameter and you pass it in as a string seems reasonable. In case you want more flexibility use pipe_output. There would be one case I don't see this working out, but that is probably then more indicative of a code smell than a lack of flexibility:

def _transform():
    ...

def _other_transform():
    ...

@pipe_output(step(_transform)) # this won't work
def interesting_node():
    ...

@pipe_input(
        step(_transform).on_input="interesting_node", # this should work
        step(_other_transform).on_input("foo") # leaving flexibility to also transform other params
) 
def downstream(foo, bar, baz, interesting_node,...):
    ...

def some_other_dependency(interesting_node):
# here you don't want to have the pipe transform happening just the original `interesting_node` input
    ...

jernejfrank · 2024-10-10T22:05:42Z

Nice! Left some thoughts -- I think this is valuable (E.G. the first parameter is a bit of an awkward assumption), but we might want to restrict the API a bit. Also some naming/consistency stuff.

The API expanded substantially to leave as much flexibility in there as possible. It also is a complete dual to pipe_output in the sense that you can target multiple parameters (like we do end nodes) and can allocate transforms individually between them -- so you could use them interchangeably / becomes a personal preference if you rather chain on output nodes or input parameter.

But I'm happy to constrain it more, like @elijahbenizzy suggested, to only allow only a single parameter and you pass it in as a string seems reasonable. In case you want more flexibility use pipe_output. There would be one case I don't see this working out, but that is probably then more indicative of a code smell than a lack of flexibility:
def _transform():
    ...

def _other_transform():
    ...

@pipe_output(step(_transform)) # this won't work
def interesting_node():
    ...

@pipe_input(
        step(_transform).on_input="interesting_node", # this should work
        step(_other_transform).on_input("foo") # leaving flexibility to also transform other params
) 
def downstream(foo, bar, baz, interesting_node,...):
    ...

def some_other_dependency(interesting_node):
# here you don't want to have the pipe transform happening just the original `interesting_node` input
    ...

Maybe it would make sense to keep it like this because the subtle difference in the implementations between pipe_input and pipe_output can work hand in hand.

Specifically, for pipe_output we touch the original node by creating this chain node_raw--chain_nodes--node, whereby for pipe_input we have param--chain_node--renamed_param_in_upstream_node so we don't touch the original node, but rename the argument in the next node.

elijahbenizzy

Looks good! Some small docs suggestions.

Also, let's create a ticket here. You've done so much for multiple already, we should track + link from the docs. Nice work!

elijahbenizzy · 2024-10-12T04:37:30Z

hamilton/function_modifiers/macros.py

@@ -345,7 +345,7 @@ def __init__(
        :param _resolvers: Resolvers to use for the function
        :param _name: Name of the node to be created
        :param _namespace: Namespace of the node to be created -- currently only single-level namespaces are supported
-        :param _target: Selects which target nodes it will be appended onto. By default all.
+        :param _target: Selects which target nodes it will be appended onto. Default None gets resolved on decorator level.


I would explain this more -- E.G. reference the decorators/how they resolve it.

elijahbenizzy · 2024-10-12T04:38:49Z

hamilton/function_modifiers/macros.py

+            .. code-block:: python
+
+                @pipe_input(
+                    step(_add_one).on_input(["p1","p3"])


Update docs -- we disabled this (for now), right?

Correct, I found a hack how to "comment out" things from rst so Sphinx doesn't generate this part of the docstring.

elijahbenizzy · 2024-10-12T04:40:21Z

hamilton/function_modifiers/macros.py

    """

    def __init__(
        self,
        *transforms: Applicable,
        namespace: NamespaceType = ...,
+        on_input: base.TargetType = None,


Thanks @jernejfrank -- this makes a lot of sense.

To sum up the offline conversation we had, we decided to restrict it to a single input (for now), knowing that we can turn it on for more. The reasoning is to reduce supported surface area until we get a request for this feature!

The new name makes the functionality clearer.

Pipe_input can select which parameter to apply to - Added on_input to Applicable for user facing API - Corrected Applicable named and namespaced to pass through target - Global on_input for pipe_input

The implementation allows for global string input for a target. - Functionality is there for multiple global and local parameter targets, but we shortcircuit for the moment with an error if the arg is not a string. - At the moment all transforms in the pipe get applied to the same argument. - Test and docs have commented out the multiple parameter cases for easy enabling should we want this. - Added URL of the GitHub issue tracking interest to enable multiple params

ellipsis-dev bot reviewed Oct 9, 2024

View reviewed changes

elijahbenizzy reviewed Oct 9, 2024

View reviewed changes

jernejfrank force-pushed the feat/target_pipe_input branch from 5441199 to 9c79275 Compare October 10, 2024 21:45

elijahbenizzy approved these changes Oct 12, 2024

View reviewed changes

jernejfrank mentioned this pull request Oct 12, 2024

multiple target parameters for pipe_input #1177

Open

jernejfrank added 5 commits October 12, 2024 10:09

Change name to hamilton_excude

c9b36ca

The new name makes the functionality clearer.

Add on_input for pipe_input to target args

20a38d7

Pipe_input can select which parameter to apply to - Added on_input to Applicable for user facing API - Corrected Applicable named and namespaced to pass through target - Global on_input for pipe_input

Add on_input docs

9c1662d

Add on_input tests

afcdb96

jernejfrank force-pushed the feat/target_pipe_input branch from 2cdbb85 to 5188e08 Compare October 12, 2024 09:10

elijahbenizzy merged commit 97cf127 into DAGWorks-Inc:main Oct 13, 2024
24 checks passed

jernejfrank deleted the feat/target_pipe_input branch October 21, 2024 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipe_input target #1173

pipe_input target #1173

jernejfrank commented Oct 9, 2024

ellipsis-dev bot left a comment

elijahbenizzy left a comment

elijahbenizzy Oct 9, 2024

jernejfrank Oct 9, 2024

elijahbenizzy Oct 12, 2024

elijahbenizzy Oct 9, 2024

jernejfrank Oct 9, 2024

elijahbenizzy Oct 9, 2024

jernejfrank commented Oct 10, 2024

jernejfrank commented Oct 10, 2024

elijahbenizzy left a comment

elijahbenizzy Oct 12, 2024

elijahbenizzy Oct 12, 2024

jernejfrank Oct 12, 2024

elijahbenizzy Oct 12, 2024

pipe_input target #1173

pipe_input target #1173

Conversation

jernejfrank commented Oct 9, 2024

Changes

How I tested this

Notes

Checklist

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

elijahbenizzy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jernejfrank commented Oct 10, 2024

jernejfrank commented Oct 10, 2024

elijahbenizzy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment