Support for bound methods at runtime #5

jlowin · 2014-02-28T17:20:38Z

Previously, detecting methods depended on looking for a self argument which isn't robust, since users could easily call the first argument something other than self.

But fundamentally, the issue is this: at compile time, there is no difference between a method named f and an equivalent function named f except that one has more arguments than the other (namely, one usually called self). Decoration (and consequently, dispatching) takes place at compile time. Binding the function (and more importantly, making it identifiable as a bound method) takes place at run time, after the dispatching has finished. So how can the dispatcher know if the function is bound and needs to be passed self?

One way around this is to use the __get__() descriptor method, which is only called if Dispatcher is being accessed as a attribute of a class. We make the assumption that if __get__ is called, the user is trying to call a bound method. This PR watches for that situation and adds an instance attribute to the dispatcher. Then when the dispatcher's __call__ method runs, if it finds the instance attribute, it includes it in the function call (taking the place of 'self') and then resets it so it won't add it every time.

This still has drawbacks... since functions are dispatched by name, methods can be overwritten by similarly-named functions that are defined later in the program, which causes confusion because Dispatcher will add self and then call the most recent function (that isn't a method). Appreciate any thoughts on how to handle this... For now, users should be aware that methods and functions with the same name are all dispatched together. Perhaps that's desirable?

mrocklin · 2014-02-28T18:15:08Z

We'll run into issues both with method/function naming clashes, but also between methods in different classes with the same name. For example I suspect that __init__ will be a common shared name across many classes that users will want to dispatch. I don't think that there is a way to successfully dispatch only on the function name and the non-self argument types. Somehow we need to specify the class.

The two mechanisms to specify the class I see are the following:

Have one Dispatcher per name (like the standard function case) but explicitly add the class into the list of types as we add a function into the dispatcher. This is hard because the class isn't defined at method definition time.
Have a separate dispatcher for each class/name pair. Depend on the class's namespace to implicitly specify the class and to avoid cross-class conflicts. This is what we do now but we have to choose to look in the local namespace with inspect rather than use multidispatch.global_namespace. This choice is based on a the 'self' in argnames heuristic.

Question:

Is there some better heuristic that we can use? Is there something about the context of a method other than that 'self' in argnames that sets it apart? Maybe something else in inspect can help us here.

jlowin · 2014-02-28T19:27:00Z

So in Python 3 the __qualname__ attribute could solve this.

def f(): pass
f.__name__ # 'f'
f.__qualname__ #'f'

class Test:
    def f(self): pass

test = Test()
test.f.__name__ # 'f'
test.f.__qualname__ #'Test.f'

So a quick fix would be to use the qualname as the first dispatch key, then the arguments, instead of dispatching on the (normal) name and arguments.

Effectively, __qualname__ defines a namespace for dispatching.

But that's only in Python 3, unfortunately.

mrocklin · 2014-02-28T20:15:28Z

That's really cool. I also tried the following experiment to further convince myself that it worked.

In [1]: def print_qualname(f):
   ...:     print(f.__qualname__)
   ...:     return f
   ...: 

In [2]: class Test:
   ...:     @print_qualname
   ...:     def f(self):
   ...:         pass
   ...:     
Test.f

I do think that it's important for multipledispatch to support Python 2.x though.

mrocklin · 2014-02-28T20:51:00Z

Does multimethods handle this sanely?

jlowin · 2014-02-28T21:04:50Z

Good question. It looks like multimethods is a mix of the current method and the new method -- it uses inspect.currentframe().f_back.f_locals to limit the dispatchers to those defined for a class (as multipledispatch does currently), and then it determines whether the function is a method at runtime and sets an ismethod flag with the __get__ function, as this PR does.

The currentframe bit still bothers me a little -- feels too magicy -- but maybe it is necessary in Python 2, which I agree needs to be supported.

mrocklin · 2014-02-28T22:49:26Z

In general I strongly agree with you about disliking magic. I sort of expect this project to be magical though, so my standards are low.

From what I can see multimethods depends entirely on local namespaces to organize Dispatcher objects. They don't use a global namespace and so don't attempt the somewhat messy "democratically defined function" problem.

This might be a possible sensible approach for your current problem. As long as we always (both for functions and methods) inject the Dispatcher into the local namespace then, if that dispatcher is in a class definition, it should be protected from other similarly named functions.

To say this another way I think that if we always use the insepct.currentframe()... method then I think the issue that you're trying to chase down will go away.

I'm not certain about the best course of action here. I think some serious thinking may be required.

jlowin · 2014-03-01T13:26:32Z

Ok, after much thought I don't think it's possible to avoid using currentframe() in Python 2. But I think you're right that it's best to do it every time and avoid a global namespace (so that methods are inherently "protected"). I just pushed commits to that effect.

Users can still supply custom namespaces, but within a custom namespace class methods and functions will collide if they have the same name (I added a test for this).

mrocklin · 2014-03-01T16:17:43Z

So the problem with this is that different modules can't "democratically" create a function. This democratic definition is for me one of the common motivators of multiple dispatch. What do I mean by this?

Example

Lets consider a doubly dispatched variant of str that is defined in at least two different .py files

# point.py

@dispatch(float, float)
def str(x, y):
    return "(%f, %f)" % (x, y)

# peopleNames.py

@dispatch(str, str)
def str(first, last)
    return first + " " + last

We would like for the same str function to work on both type signatures, even though they are defined in different modules (or even different projects). To accomplish this then one of the projects must create the custom namespace and the other projects must import the namespace from that module. This requires coordination and the choice of a "master" project. This coordination might be difficult. Is this important? I don't know.

I actually think that the global namespace is a useful default. It is in line with solutions in C, C++, and Julia.

BTW, I apologize for coming back each time with a problem that the current solution can't solve. No solution exists that solves all of these problems. Mostly I'm working out for myself what I think of the various approaches.

jlowin · 2014-03-01T16:50:51Z

No need to apologize, it's helpful to go back and forth!

Your reasoning about the global namespace makes perfect sense, I think I missed the forest for the trees on that one while trying to force a solution. So I think what we've done here is taken a grand tour of all the options and are ending up back at your original solution (which is a good thing!).

I guess we must either rely on heuristics (like the current args[0] == 'self') or have users explicitly write something like @dispatch(int, float, method=True) or even @methoddispatch(int, float) to distinguish methods from functions. As far as I can tell, there isn't a mechanism in Python 2 that allows us to programmatically put functions in a global namespace and methods in a class namespace during decoration. Personally, I lean toward the explicit solution and not the heuristic because it will always work, and because when the heuristic fails users won't know why. But I understand the appeal of the heuristic for simplicity.

jlowin added 3 commits February 28, 2014 10:23

gitignore

0fd50a2

check for method instances at runtime

1b4c319

avoid name collisions with methods because test order is random

8990a68

jlowin added 2 commits March 1, 2014 08:24

remove global namespace and restore frame inspection

db99304

new test for method/function collisions

8da0cfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for bound methods at runtime #5

Support for bound methods at runtime #5

Uh oh!

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Mar 1, 2014

Uh oh!

mrocklin commented Mar 1, 2014

Uh oh!

jlowin commented Mar 1, 2014

Uh oh!

Uh oh!

Support for bound methods at runtime #5

Are you sure you want to change the base?

Support for bound methods at runtime #5

Uh oh!

Conversation

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Feb 28, 2014

Uh oh!

mrocklin commented Feb 28, 2014

Uh oh!

jlowin commented Mar 1, 2014

Uh oh!

mrocklin commented Mar 1, 2014

Example

Uh oh!

jlowin commented Mar 1, 2014

Uh oh!

Uh oh!