Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic type stub (pyi) generation for java classes #714

Open
michi42 opened this issue Apr 27, 2020 · 77 comments
Open

automatic type stub (pyi) generation for java classes #714

michi42 opened this issue Apr 27, 2020 · 77 comments
Labels
enhancement Improvement in capability planned for future release

Comments

@michi42
Copy link
Contributor

michi42 commented Apr 27, 2020

I'm working on making a rather big, domain specific java API available through JPype. One of our main issues is the complete lack of type information during static analysis - and therefore no linting or IDE auto-completion.

PEP 484 and 561 support providing type information in separate .pyi "stub" files. The mypy project provides stubgen, a tool to automatically create these from python or C modules.
Since Java is strongly typed, all the information necessary to generate these kind of stub files is in principle available in the class files and can be obtained e.g. through reflection.

I implemented a first version: https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735
EDIT: Now a standalone repo: https://gitlab.cern.ch/scripting-tools/stubgenj

The module is supposed to be called from a simple script that sets up the JVM, e.g.

import jpype
import stubgenj

# fire up the JVM with the necessary JARs

stubgenj.generate_java_stubs(["java", "cern.lsa"], "pyi")

This is still very limited, just a starting point that handles the "most common" cases. In particular, it does not support Generics (should be translated to python type vars), Constructors (should be translated to init methods) yet, and the types of constants and enum constants are not yet detected.

@Thrameos
Copy link
Contributor

Interesting. I look forward to some additional information.

Can this be connected to the existing annotation API that we are using? We currently support Jedi auto-completion which allows linting of returns types if the overload is not mixed. I haven't tried with Kite.

Here is my recommendation as an implementation. Just like Java docs we can pick up Java information though the resource API. Therefore, if you run a preprocessor that converts the Java jar file (org-mystuff.jar) into a second jar with the needed annotation org-mystuff-py.jar. Then when I load a class using org.mystuff.MyClass.class I can also fetch the org.mystuff.MyClass.annot file which can then provide stubs.

@michi42
Copy link
Contributor Author

michi42 commented Apr 27, 2020

Hi,

Thanks a lot the quick response!

Indeed, it goes into the same direction as the annotation API (and __docs__) that JPype is already populating at run-time (which is great for interactive applications like IPython or Jupyter), however it provides static information that is pre-generated in once and can then be used without executing any of the code (as it is typically the case for IDEs like PyCharm).

Also, this supports overloads, even when the signatures are completely different and return types depend on the signatures, through the overload decorator: https://docs.python.org/3/library/typing.html#typing.overload - also it should be possible to represent generics and type arguments, although my first version does not handle that. I frankly don't know if the annotation API supports all of these.

As for distribution, I'm afraid it can't be (just) a JAR file, as the whole point behind this is that python IDEs (and possible linting tools) can read the type stubs in a purely static way, without having to run any of our code. Therefore, the type stub "shadow packages" have to be packaged and installed in a way common python IDEs support, typically as an egg/wheel. The structure has to match the Java class hierarchy, for instance (after generating "java.*" and some private APIs):
image
Note that there are only __init__.py files - because Java does not have the concept of modules, only packages and contained classes.

A few results (in PyCharm):
image

image

image

@Thrameos
Copy link
Contributor

Okay that makes sense. Please keep me posted on any hooks that you require in terms of support.

https://www.jetbrains.com/help/pycharm/stubs.html

I guess the correct solution would be to make a jpype-java-v8, jpype-java-v11, etc. package that gets posted to pypi then? We can then have those packages depend on JPype1 to reduce the need to install both separately. Is that what you envision?

This is very similar to what I am working on for the reverse bridge in which Python generates a package with nothing more than interfaces for each Python class. That allows Java to have an interface with each required method exposed in Python.

@michi42
Copy link
Contributor Author

michi42 commented Apr 27, 2020

Okay that makes sense. Please keep me posted on any hooks that you require in terms of support.

Sure, thanks a lot!

I guess the correct solution would be to make a jpype-java-v8, jpype-java-v11, etc. package that gets posted to pypi then? We can then have those packages depend on JPype1 to reduce the need to install both separately. Is that what you envision?

That would indeed be an option to ease the working with java standard library classes (like https://github.com/python/typeshed does for python standard libraries).
Also, I think it could be useful to ship the stub generator along with JPype to allow users to generate stubs for their own domain-specific Java packages.

One potential (minor) issue with static stub generation is that java.lang.String auto-conversion can be switched on and off at run-time. For the time being I assumed it to be on, and therefore java.lang.String becomes str in my first version of the stub generator. Perhaps it would be more correct to treat it as Union[java.lang.String, str] (= either of these).
Still, a static type checker will not be able to detect if the conversion flag is going to be on or off at run-time.

@Thrameos
Copy link
Contributor

The default for JPype 0.8 on will be conversion is false. The on mode is a bit of a misfeature as there are a few functions in Java where chaining a string is needed. (It is non obvious if you call the Java string constructor and get back a Python string). Though we may be able to splice some aliases into the java.lang.String so that it works a bit closer to Python strings. I remember there was a conflict on a method contract a while back. But I should probably refresh my memory on that topic.

We should definitely ship the stub generator as part of JPype so that someone can run it as needed in the field.

@Thrameos
Copy link
Contributor

I reviewed the contract for string and Java string. There is one strong conflict, two weak conflicts and 2 near conflicts.

The strong conflict is Java and Python format. They are similar in design but one is a static method and the other operates on a string. Split is very similar but Java uses regexp rather than simple text on the split, Replace is almost the same but Python adds an extra parameter. endswith and endsWith are near miss. startswith and startsWith are near miss.

I could submit a PR with Java String completing almost the same contract as Python. The return type will always be Java strings for consistency. Give then we are switching to no longer convert strings, this would in principle save a few people, but as overloading strings completely is not possible (though maybe I should review this as this was last reviewed in Python 2 age. The other thing that will not pass is of course isinstance. Hard to dodge that one without deriving the type.

@marscher would this be worth the effort to put in as a PR?

@michi42
Copy link
Contributor Author

michi42 commented Apr 28, 2020

Thanks a lot for checking. I agree that conceptually this feature is not very nice, although in many common use cases it saves a lot of explicit type conversion clutter. Checking how this is done in PySide (Qt), they do the same for QString ...
Actually I think there are two use cases for the automatic conversion - from str to java.lang.String (e.g. arguments of Java methods) and the other way around (e.g. return values of Java methods).

For the return types, yes, we may get away with making java.lang.String "look like" python str, and I agree it could be a nice feature, although as you noticed it will not cover all use cases.

On the other hand, what is the plan for str to java.lang.String conversion? This one seems "harmless" to me. With convertStrings=False, will you have to pass explicitly constructed java.lang.String objects to Java methods which take strings as arguments?

Anyway, I'm going to make that the stub generator honors the flag at generation time, so it is up to the user to decide. "Official" stubs would be generated assuming convertStrings=False, but for those using the feature, they can build their own stub tree with convertStrings=True. In the end it's a simple type replacement at the level of the stubs.

Edit: is there any foreseen way to detect from Python if JPype was started with convertStrings=True? Currently I'm using this, but it is somewhat artificial...

def convert_strings() -> bool:
    if convert_strings.jpype_flag is None:
        from java.lang import String  # noqa
        convert_strings.jpype_flag = isinstance(String().trim(), str)
    return convert_strings.jpype_flag


convert_strings.jpype_flag = None

@Thrameos
Copy link
Contributor

Convert strings only affects the return path. We always implicitly convert string as arguments.

The argument path for stubs likely needs some extra hooks. We have quite a few automatic forward conversions like string, path, date, and list. I dont think there is currently a way to get a list of implicit conversions. Some of these are hard types, and others are duck types. What level of detail do you need for stubs? Do you need a method to extract the conversion rule by Java class?

@michi42
Copy link
Contributor Author

michi42 commented Apr 28, 2020

Yes, then indeed I would need a way to extract a list of the types accepted by the implicit conversion for a particular java class in a method argument.
For the accepted hard types, it's straightforward, I would just need them as python type objects or strings.
For the duck types, we could define Protocol types - or if there is one, use one of the pre-defined protocols (e.g. Iterable, Mapping, ...)

Another question that goes in the same direction: is there a hook to get the "mangling" jpype does to identifiers when they match a python keywords (adding '_')?

@Thrameos
Copy link
Contributor

The implicit rules currently recognize three types of conversions. The first is exact in which a conversion is only applied if it exactly matches, but this one is only there to trigger fast logic. The others are in the form of a list of types that will be taken or a list of attributes that will be probed. I suppose the best way to do this is to get you an API call that returns those two lists as a tuple so that we can construct Protocol types for each.

Name mangling is handled by the pysafe function in jpype._pykeywords. It is applied by classes to methods and fields, to accessors in beans, and to packages and imports.

@Thrameos
Copy link
Contributor

Okay looking it over there is one special exception in the type system. Python strings are Sequences but JPype flatly refuses to recognize them. There is almost no case in which the user wants to pass a string to a list object and make it break it into chars.

@michi42
Copy link
Contributor Author

michi42 commented Apr 28, 2020

Ok, I can't see how this special exception could be easily represented in the type annotation system, as if we e.g. allow Iterable, it will include str. But I would say, we can leave that aside for a first version.

@Thrameos
Copy link
Contributor

"Never let perfect be the enemy of good."

There will also be an edge case or two. But nothing here gives me any pause. I will work on the probe API. Hopefully completed by the end of the weekend.

@michi42
Copy link
Contributor Author

michi42 commented Apr 28, 2020

Perfect, thanks a lot, that would allow me to get rid of this hardcoded mess: https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735#file-stubgenj-py-L212 ;-)

For objects returned by java to python, are there any implicit conversions (apart from String -> str if convertStrings = True)?
From what I see, boxed types are returned as such.
Primitives become JShort, JInt, ... - but since these implement the python int, it is probably safe to treat them as int (or bool, float, ... respectively)
Arrays become something that implements the List contract?

@Thrameos
Copy link
Contributor

There are a few implicit conversions

  • null goes to type None. So anything returning an object type will get None as well as the object type. Not sure stubs will care about this.
  • Void type will eventually go to JVoid which will be magical type that refuses to do anything. This will prevent someone from accidentally using a void in an if statement.
  • Java boolean goes to Python bool type (True, False)
  • Java string goes to Python str when convertString is True
  • Primitives each return the JPrimitive wrappers which all derive from the corresponding type.
  • Boxes types act like primitives with the minor exception that they can be null objects. They do have some special methods so you may need to leave them as is.
  • Java char goes to a str with length 1. This is a weird special case because I can't make str and int types play nice.
  • Most containers will implement the corresponding contract and satisfy the collections.abc. This is still a work in progress. Best to leave them as the Java type then make the stub show it implements the contract.
  • The other special case is the JObject cast which forces the type to match. Unfortunately it takes string arguments so we can really see the contract. The new casting operator fixes this problem. (Please comment on that thread if you have an operator preference).

@Thrameos
Copy link
Contributor

Progress report

I did a some work until the wee hours of the morning on this. I think that I settled on a solution. Each converter in JPype will have a getInfo method that adds the type information from the conversion to a conversion structure. When you call the private field _info on a type it will produce a dictionary holding.

  • ret - the type that will be returned by a function which produces this type.
  • exact - a list of types that are considered an exact match to this type when called as an argument.
  • implicit - a list of the types that are considered an implicit match to this type when called as an argument.
  • explicit - a list of the types that will cast if used as a return type in a proxy (not to be used in stubs)
  • none - a list of types that are prohibited from matching (for example string can't match List even if it is a sequence) This will override any implicit matching rules. (basically this is a veto)
  • attributes - a list of strings holding duck typing for implicit matches during argument passing.

There may be repeats on the list (order is based on the conversions applied to a type). (For example it may try PyLong_CheckExact followed by PyLong_Check which would put two copies of long into the list). This will be added to by any additional type information (flags, array component, etc). I may end up merging this the current _hints field. But it will give the same fields under a different name.

Will this be enough for the stubbing system?

@michi42
Copy link
Contributor Author

michi42 commented Apr 29, 2020

Thanks a lot, looks good to me. Just for my understanding, how is attributes supposed to work? Does it just require an attribute with that name to be present? Does that attribute have to be of a particular type?
Perhaps it would be easier (and more flexible) if the converters themselves would define an appropriate Protocol type and return that as part of the implicit list?

Another thingy - How can one collect a list of all active converters in JPype?

As a heads-up from my side, I've implemented a first version to support Java Generics, and transform them into their python TypeVar counterparts. This is still not perfect and it has its limitations, some of which I still hope to overcome (as far as the python type hint system allows).
So far I've been updating https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735, but if you prefer I can also turn that into a PR (any preference on where to put stubgenj in the repo?)

e.g. for java.lang.Collection<E> it generates something like this...

_Collection__E = _py_TypeVar('_Collection__E')  # <E>
class Collection(java.lang.Iterable[_Collection__E], _py_Generic[_Collection__E], _py_Collection[_Collection__E]):
    def add(self, e: _Collection__E) -> bool: ...
    def addAll(self, collection: 'Collection'[_Collection__E]) -> bool: ...
    def clear(self) -> None: ...
    def contains(self, object: _py_Any) -> bool: ...
    def containsAll(self, collection: 'Collection'[_py_Any]) -> bool: ...
    def equals(self, object: _py_Any) -> bool: ...
    def hashCode(self) -> int: ...
    def isEmpty(self) -> bool: ...
    def iterator(self) -> 'Iterator'[_Collection__E]: ...
# ..

@Thrameos
Copy link
Contributor

Attributes are all methods taking just self. (__int__, __float__, __index__, __iter__). Technically, it just looks for anything with that attribute and calls a routine that then decides how to make it happen. So it doesn't care about the signature, but we can assume it is single argument converter types for simplicity.

Suppose that you ask for the types that work with a java.lang.Number. Calling info would give

  `ret`: java.lang.Number
  `exact`: [java.lang.Number]
  'implicit': [int, float]
  'attributes': ['__index__','__float__']
  'none':[]

The attributes are used to pick up numpy number types (np.int32, np.int16, np.float64, ,etc)

The intent is to make a _java_lang_Number 'protocol' that can be used to make:

   def setValue(self, num: _java_lang_Number) -> JVoid: ...

I am not sure how to make the to number protocol that knows to take any thing with that specification (which is why I never got far on this path). But assuming you can make it happen we can have very good stubs.

The lists of user defined (like date, pathlib.Path, sql types) are available in the _hint structure. I should be able to expose them to a query. Every Java wrapper has a hints structure. They are split so that conversions can be defined before the JVM is started. Most of the conversions as not currently user defined, but rather coded into the C++ layer. I can give an interface that gets a list of types that have user conversions defined.

@Thrameos
Copy link
Contributor

If you can tell me how to turn the above into a protocol then perhaps I can put a protocol field into the info. I just don't know enough to about how to define a duck type in protocols.

As far as where to go lets get it working first so we have a clear picture off all the requirements, then convert it to a PR as jpype.stubs. It should expose symbols with Java calling convention using __all__ for a few basic methods, and also be runnable with a __main__ so we can use python -m jpype.stubs --classpath='deps/*' org.company.jar to produce a package for distribution.

@Thrameos
Copy link
Contributor

Let me take a guess at the example:

# attribute type 
class AsIndex(Protocol):
   def __index__(self): ...
class AsFloat(Protocol):
   def __float__(self): ...
# definition (union of exact, implicit, and attribute)
_Number = Union[java.lang.Number, int, float, AsIndex, AsFloat]

Is this anywhere close?

https://www.python.org/dev/peps/pep-0544/#using-protocols

@michi42
Copy link
Contributor Author

michi42 commented Apr 30, 2020

Yes, this is exactly how you would define the Protocol classes. Just note that for some common cases, there are actually pre-defined Protocols in typing, like SupportsFloat: https://docs.python.org/3/library/typing.html#typing.SupportsInt

For turning stubgenj into a runnable module, sure I will do it, however it will only work for "basic" cases where you do not establish a classpath programatically, add custom import domains or alike. For more complex cases, it's probably the most user friendly to still expose a public API that allows the stub generation to be triggered from a python script, after setting up jpype and starting the JVM with the necessary options.

@Thrameos Thrameos added the enhancement Improvement in capability planned for future release label Apr 30, 2020
@michi42
Copy link
Contributor Author

michi42 commented Apr 30, 2020

I just found a funny "feature" in the forward conversion of arguments ...
Assume I have a method that takes a collection like this:

public static SomeRequest SomeRequestBuilder.byNames(java.util.Collection<String>)

If I invoke it like the following, I get a "no matching overloads found" message:

from cern.bla.factory import SomeRequestBuilder

print(SomeRequestBuilder.byNames(["a", "b"]))

however, if I add

from java.util import ArrayList

before, the very same statement suddenly starts working (but the IDE thinks this import is un-used).
Could this have something to do with how customizers are loader? Are they only loaded on demand?

Shall I open another issue for this?

@Thrameos
Copy link
Contributor

It isn't supposed to be on demand loading. I think it warrants an issue if you can replicate it.

I am guessing there is a bug in the customizer that is failing to load ArrayList so that conversion can be completed.

@Thrameos
Copy link
Contributor

Hmm maybe I don't need to support attribute style conversions if I can cast them all into Protocol checks. I need to ponder that.

@Thrameos
Copy link
Contributor

The new JPackage may make it a bit easier. You can now walk the whole package tree.

j = jpype.JPackage("java")
def walk(name, pkg):
    for i in dir(pkg):
        h = getattr(pkg, i)
        print("%s.%s"%(name,i))
        if isinstance(h, _jpype._JPackage):
            walk(name+"."+i, h)
walk("java",j)

Drat.... I see an exception. Well another thing to go fix.

@michi42
Copy link
Contributor Author

michi42 commented Apr 30, 2020

Yes, I'm definitely looking forward to having #670 for the stub generator - the way how it discovers classes now is by far not optimal.

I'll make a minimal example for that forward conversion thingy.
Edit: #721

@michi42
Copy link
Contributor Author

michi42 commented May 3, 2020

FYI, I just updated https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735 with a basic __main__ to cover the most common cases. Also I added __all__ with the main entry point method (both in camelCase and in snake_case).

@Thrameos
Copy link
Contributor

Thrameos commented May 3, 2020

I will probably drop the snake case. By the Python principle there can only be one, and as 95% of the library is interacting with CamelCase (due to Java), we have preferred camel case throughout.

So is there anything else that you require at this point?

@Thrameos
Copy link
Contributor

Yes it appears to be possible to recover that information. Though getting to it will be a bit fun.

public class Test
{

  public void doSomething(List<String> a)
  {
  }

  public static void main(String[] args) throws NoSuchMethodException
  {
    Method m = Test.class.getMethod("doSomething", List.class);
    System.out.println(m);
    for (Type t : m.getGenericParameterTypes())
    {
      System.out.println(t);
    }
  }
}

gives....

public void Test.doSomething(java.util.List)
java.util.List<java.lang.String>

I am not sure how much wiring it will require to get that additional information but it seems it would be possible.

It will take a bit of meta programming to define type interfaces, make them get recognized, and then pick up the typing info.

@michi42
Copy link
Contributor Author

michi42 commented Jul 29, 2020

Actually stubgenj is already reading this information, in exactly this way :-)

My only "problem" remaining is that the JCollection classes are not defined as generic (unlike e.g. typing.Mapping[X,Y]), so I can't add the type arguments to them when generating stubs ...

@Thrameos
Copy link
Contributor

Well I have overloaded the classes to accept '[:]' and '[#]'. It would be a small matter to make them accept [type, type]. The question is what should they return? Given you know they are generics can't you just make java.util.List create a stub java.util.typing_.List[] class using typical generic programming. That would have the advantage that it would work with any generic.

@Thrameos
Copy link
Contributor

Actually the question is where should the dynamic stub be generated from. Is it better to make it java.util.typing_ where the stub generator lives in JPackage or would it be better to place the hook in java.util.List.Typed_[X] where the JClass can produce the typing class if its declared generics (or None if it is not.)

@michi42
Copy link
Contributor Author

michi42 commented Jul 30, 2020

Sorry, I think I don't fully understand your proposal.
At the moment, stubgenj is generating the following for java.util.List:

_List__E = typing.TypeVar('_List__E')  # <E>
class List(Collection[_List__E], typing.Generic[_List__E], typing.List[_List__E]):
    @typing.overload
    def add(self, e: _List__E) -> bool: ...

Note the extra supertype typing.List[_List__E], which is currently hard-coded in stubgenj for a few Java collections.

If I look at jpype.JClass("java.util.List")._hints.implements, which gives me jpype._jcollection._JList. However, I currently can not generate a stub like

_List__E = typing.TypeVar('_List__E')  # <E>
class List(Collection[_List__E], typing.Generic[_List__E], jpype._jcollection._JList[_List__E]):
    @typing.overload
    def add(self, e: _List__E) -> bool: ...

since _JList is not generic.

The easiest way to allow this would be to declare _JList (and other customizers) as following in _jcollection:

E = typing.TypeVar('E')
class _JList(typing.List[E]):

See https://docs.python.org/3/library/typing.html#user-defined-generic-types for more examples.

This will not have any impact on the run-time behavior, but it allows for static type checking (by formalizing that this object is implementing the List contract for a particular item type)

@Thrameos
Copy link
Contributor

Thrameos commented Aug 1, 2020

I am still confused by what you are trying to achieve with _jcollection._JList. _JList is not a real type just like everything in _jcollection. Those stubs are all going to disappear in versions soon because I am moving to a scaling solution were they appear in the __init__.py stored in the jar file. Customizers should not be loaded until a package is first used so these sort of internal stub classes are all going to move outside the view of the user. _JList is just a class from which methods are being stolen to create the java.util.List wrapper. It never appears in the inheritance tree.

What we can do is either add new stubs in protocol or add a behavior to all Java classes which are generic (JClassGeneric) which will take type arguments to []. For example, I can add new behaviors to java.util.List such as being able to take type arguments which you can then hang stubs on. Thus, java.util.List[String] would return a new (dynamically created type) which inherits from java.util.List. Does that help clarify?

@michi42
Copy link
Contributor Author

michi42 commented Aug 1, 2020

Sorry, let me explain again. When generating the stubs for collections (and other types that have customizers), I somehow need to take into account the fact that a customizer exists - and what it does.
So far, I have a hard-coded list of customized java classes in stubgenj, for which I add an extra supertype in the stub:

def extraSuperTypes(className: str, classTypeVars: List[TypeVarStr]) -> List[str]:
    if className == 'java.util.Map':
        return ['typing.Mapping[%s, %s]' % (classTypeVars[0].pythonName, classTypeVars[1].pythonName)]
    elif className == 'java.util.Collection':
        return ['typing.Collection[%s]' % classTypeVars[0].pythonName]
    elif className == 'java.util.Set':
        return ['typing.Set[%s]' % classTypeVars[0].pythonName]
    elif className == 'java.util.List':
        return ['typing.List[%s]' % classTypeVars[0].pythonName]
    return []

For List this leads to (the bold thing is added by this functionality):

_List__E = typing.TypeVar('_List__E') #
class List(Collection[_List__E], typing.Generic[_List__E], typing.List[_List__E]):

But this is obviously not scalable (it will never work for user-defined customizers), and it duplicates knowledge of JPype into the stub generator, where it does not belong.

Now my idea was to use jpype.JClass("java.util.List")._hints.implements as a list of extra classes that should be considered supertypes of the java class under generation. Currently, the problem with this is that the JPype internal customizers are not generic (in terms of type hints only, at run time this has no effect).
However, if you say in the future customizers won't be loaded or even be accessible except through JPype, this is probably anyway not a good idea.

In this case, could we foresee a way to define (and retrieve through JClass._hints) a Protocol which the customizer implements? For instance, the _JList customizer would define that it implements the protocol typing.List.

For Java classes, there is nothing to do to make them generic - this is purely a thing at static type-checking time, the runtime behavior is fine. For Java classes, the generated stubs define them as generic (using typing.Generic) in python if they are generic in Java. However, when it comes to python customizers on top of Java classes, things are not so straightforward anymore ...

@Thrameos
Copy link
Contributor

Thrameos commented Aug 1, 2020

I would say the correct hook mechanism would be to defined the generics that you want in protocol, and then add a hook into the JClass structure so that when the jcustomizer is loaded it can register the protocol in its outgoing implicit list.

So lets walk through the changes. The hint structure composes a list by walking through each of the type conversions calling get info. As this is just a fake piece of info, there is no reason that it needs to be added to the actually conversion list. Instead it should simply extend the list with the user supplied type info. So I would guess the modifications would be

  1. Add the generic type stubs that as you would like them to appear in jpype/protocol.py
  2. Go to JClass around line 600. Lookup in the hints structure if "generic" attribute is set and if it is a list use it to extend the implicit list.
	PyObject_SetAttrString(hints.get(), "returns", ret.get());
	PyObject_SetAttrString(hints.get(), "implicit", implicit.get());
	PyObject_SetAttrString(hints.get(), "exact", exact.get());
  1. We need need to tell the customizer to look of a global attribute __generic__ in the stub class and if found copy that list into the hints structure. I would guess somewhere in the JImplementionFor method.
    def customizer(cls):
        hints = getClassHints(clsname)
        if base:
            hints.registerClassBase(cls)
        else:
            hints.registerClassImplementation(clsname, cls)
        return cls
    return customizer
  1. Last you would go to the customizers and add the __generic__ to each of the classes so they now know to add the generic type to the implicit list.

This is just a rough sketch of how I would do it. There may be other details like how to add the generic type parameters or how many needed to be added that would need to be worked out. Does that make sense? You can use a similar mechanism to add all types of meta information to the hints class.

As a separate mater we should have the list type converter get the generic type arguments so that it properly forces each of the elments of a List into a String rather than the current object.

@michi42
Copy link
Contributor Author

michi42 commented Aug 3, 2020

Hi,

Sounds good, but I'm not sure if I understand the "outgoing implicit list" part correctly. To make it clear - I'm not talking about method arguments (there the implicit list does all I need), but about the customizers that add extra functionality to certain java types, e.g. _JList which makes java.util.List behave (amost) like any python list.
Therefore, I don't think we have to add anything to the implicit attribute of the hints structure. I thought that the implements attribute in the hints structure would serve this purpose, but from what I get this is not (or in the future will no longer be) the case, and the customizer classes themselves (e.g. _JList) should not be used directly in stubs.

So in general, we need a way to formally specify what a particular customizer does - e.g. via a supertype or protocol in jpype.protocol.
The customizer should then specify this (e.g. via the decorator or via an extra attribute).
For example, the list customizer could be like this

@_jcustomizer.JImplementationFor('java.util.List', implements=jpype.protocol.List) # or typing.List
class _JList(object):
    """ Customizer for ``java.util.List``
    This customizer adds the Python list operator to function on classes
    that implement the Java List interface.
    """
# ...

This should then end up in a separate attribute in the hints structure, e.g. customizer or implements (in the latter case what is currently called implements should be moved to _implements, as it is private).

In the cases of collections, this will usually be a generic protocol using one or more type variables (e.g. List, Mapping, Set, ...). If the supertype/protocol is generic, and the java class is generic, the stub generator will make sure that type arguments of the java class are forwarded, so if jpype.JClass("java.util.List")._hints.implements would return jpype.protocol.List, then the stub for java.util.List will become

_List__E = typing.TypeVar('_List__E') # <E>
class List(Collection[_List__E], typing.Generic[_List__E], jpype.protocol.List[_List__E]):
# ...

Note that all collection types imported from the typing module already support type arguments.

However, even though the immediate problem was that the customizer classes for collections were not generic, I would not call the attribute generic.

I see some other (non generic) classes also have customizers, e.g. java.lang.Thread. If I should not reference jpype._jthread._JThread in the generated stubs - and if I understood correctly I should not - we would also need to define and use a supertype/protocol for the functionality added by these customizers.

@Thrameos
Copy link
Contributor

Thrameos commented Aug 3, 2020

Adding the protocol to the customizer as a keyword seems like a reasonable solution and you can place that info into the hints pretty easily from there in _jcustomizer. As you are the primary consumer of this particular API, I would recommend you take the first shot at the implementation. It should be a pretty easy modification as all the expected entry points are located.

I am not sure what you mean about non-generic classes with customizers. Shouldn't the sub generator just grab the extra methods into the stubs and forget entirely about the customizer? Or is there some issue that I am missing. The point of the customizers is to give extra methods so that those methods appear to be native to the Java class. Calling out the customizers in any way seems like a bad idea. Though those that are marked "base type" likely still have to have a presence.

The change that is removing the customizers from users view is now in the incoming PR (#828) though I have not moved them yet. To make it so that we can scale the customizer system, the customizers are going to be located in the Java jar itself. So the _JList customizer will be located in "java.util.init.py" which will be loaded from the org.jpype. That way if someone provides 3rd party customizers they just have to insert them in the jar file or a companion jar file. (At last, I can make my gov.llnl.math library a real boy!) I will likely make it support both pyc and py files (if I want it to be Python version independent). You could technically refer to the revised ones by the "java.util._JList" but only if the import system is installed. My long term goal here being to hollow out jpype to just be decorators, basic types, and start routine and have everything else live under the Java space.

(I would really like to make imports a required package at some point, but that is another issue.)

@michi42
Copy link
Contributor Author

michi42 commented Aug 3, 2020

I am not sure what you mean about non-generic classes with customizers.
Shouldn't the sub generator just grab the extra methods into the stubs and forget entirely about the customizer? Or is there some issue that I am missing.

That's a very good question - the reason is that I am relying on Java reflection (e.g. getMethods()/getDeclaredMethods()) to work out the types (including array types, generics and type arguments) of both the method arguments and the returned value, which (obviously) only works for Java methods. Part of the information is also accessible through python inspection and __annotations__, but not the full story.

As shown above, my initial approach to this was to treat customizers as an extra superclass of the customized class. This works fine as long as methods are only added but not overridden. However, to get code completion and type checking right, this needs the customizer code or some protocol/stub describing the added methods of a customizer to be available to the IDE.
So the proposal above would require that we define a protocol for each customizer in jpype.protocol or another python package ... which kind of defies the concept of packing customizers into JARs.

Now understanding a bit better what the plan is - thanks a lot for the explanations! - I see two options that I think would work well with the planned packing of customizers into JARs:

  1. Give the author of the customizer the possibility to provide a type stub (as a separate pyi file, e.g. java.util.__init__.pyi in addition to java.util.__init__.py) within the JAR. In this case, JPype would just need to provide a way of accessing this file within the JAR programmatically. Stubgenj would prepend it into the generated stubs from the java classes of the package. Like for any python code, the stub file can be generated using mypy stubgenj and be modified as necessary.

  2. In principle it would be possible to use python inspection as a fallback to generate stubs for customizers on the fly and add them to the generated stubs of the package. However, this would probably mean to either duplicate functionality of, or calling into, mypy's stubgen. Also in this case, in order to have the argument and return types accessible by inspections, customizers should have proper inline type annotations.

Both solutions should work equally for generic and non-generic customizers. I think I would prefer solution 1 for simplicity and separation of concerns (stubgenj should not need to generate stubs for python code).

So once the #828 is merged, as time allows, I will take a first shot at this.

@Thrameos
Copy link
Contributor

Thrameos commented Aug 3, 2020

Alright this is much more clear. So my recommendation is that after the sub generator goes through the Java class it asks the _hints for the list of class customizers. Basically start with option 2. You can pretty much discard anything that overloads a Java method. Those won't have any type information and are generally being phased out anyway in favor of the converters. They were only used to insert some code in the process with a few minor exceptions. As the functionality should be the same in most cases if won't cause an issue. There are some places where the name was conflicting that perhaps we should flag. The JOverride accepts keyword arguments so we can likely place a tag there. After that you just need to copy over stubs for the extra methods so the stub generator for python will be pretty limited. We can either add stub information to Python or manually tweak the info on the method using a decorator.

We can then work to allow option 1. I can also pretty easily put the stub files in the jar as well as you recommend and place the stub file in the hints structure as well. We can get to the stubs already using Java methods or add a special hook into _hints or even as a custom element on "class" . When I move the py files over, we will generate the stubs and place the pyi files in place. Getting the stubs from the jar file is pretty easy using org.foo.Bar.getClass().getResource("__init__.pyi") which is what I am doing to get the Python implementation. The difference being that I have to compile the import the code into the module. I am not sure how stubs work with eval but it should be similar.

Perhaps I should just go hog wild and start adding stuff onto the totally useless Java class "Package" and move the hints stuff to "Class" so this sort of stuff is available in the public API rather then hidden in the depths. But lets finish this first round and then start promoting to a public API.

The reason for me to finally start into the scaling problem with the jar file solution (other than I am stuck inside all the time and can't go to the gym thanks to COVID 19), it that the android port hit a snag because they somewhat foolishly named their Python packages the same as the Java packages so they have a real mess to deal with in terms of compatibility. Making it so that Java packages are Python modules and you can insert arbitrary Python code into the Java package as __init__.py means they can resolved their problem just by renaming and importing __all__. The side benefit is it gives us a home for the customizers. I just need to deal with the Python version campatablity problem. I would really like to distribute pyc only in the jars, but then the jar file will only support one Python version. So I am guessing I will have to have the option of pushing Py if generic jars are required.
(Come Python developers... Java can in theory compile Java 1.4 byte code from Java 13 if you want to make something with wide support)

@michi42
Copy link
Contributor Author

michi42 commented Aug 6, 2020

I gave it a try (Gist updated).

After spending some brain cycles on generating stubs in stubgenj for python customizer classes by inspection, I gave up - this is not straightforward, in particular under the presence of overloads or type parameters, and then it is not quite the scope of stubgenj.

So what I do for now:

  • if I can find existing stubs, I will use them. Currently I'm looking for a file with extension "pyi" in the path where the module is, but in the future there can be other options, e.g. reading from a JAR like in solution 1.
  • if I can't find existing stubs, I try to import and call MyPy stubgen to generate stubs from python code AST. This takes into account annotations and missing that, infers types if certain conditions are met ... much smarter than what my little try did.

In any case, I write the stubs for all customizer modules of a particular java package into a "_customizers.pyi" file, and import them into the main "__init__.pyi". The customizer classes are then added as superclasses to the java classes they customizer.
At a later stage, I could also attempt to merge the stubs, but for the time being I think this will do fine (except for overridden java methods in customizers) - and it saves me from concerns of namespace clashes.

For this to work fully with the existing customizers for collections, we need to add type annotations or stubs for them. I will submit a PR on this soon.

@Thrameos
Copy link
Contributor

Thrameos commented Aug 6, 2020

Sounds like a plan. Thanks for all your hard work on this.

I solicited some users on features some of which may need some stub support.

Bayer-Group/paquo#33

Some of the features they were interested in were name mangling to Python names, removing methods from the API, and renaming or replacing methods.

It seems like for your use I should go about finishing my long planned upgrade to _JMethod by renaming it to _JMethodDispatch and then exposing an actual _JMethod front for individually overloads. Thus, rather than having to scan the whole java.lang.Class you can instead ask an individual dispatch what methods are under it and their typing information. That way if a dispatch were to be renamed, deleted, or have its return types altered that information would be available to the stub generator directly rather than going all the way back to the Java class definition and trying to work forward..

I also added the initial support for generics in #835. It allows things like 'java.util.List[String]` to exist. The complexity being I really don't get some things about Java reflection of generics. Which is how to I tell if a generic parameter is bound to the Class or to the method.

public class Generic0<T>
{
  public void A(T t, Object b)
  {
  }

  public <T> void B(T t, Object b)
  {
  }
}

In the A method the argument should be bound to the class in the second it is free. I check the assembly and the information is clearly being stored but I can't see how I can discern that from the reflection API.

public class Generic0<T> {
  public Generic0();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public void A(T, java.lang.Object);
    Code:
       0: return

  public <T> void B(T, java.lang.Object);
    Code:
       0: return

@michi42
Copy link
Contributor Author

michi42 commented Aug 6, 2020

Indeed, not being able to enumerate all overloads from JClass was one of the reasons why I used the Java reflection API. It is not the only reason though - not all typing information is available currently from __annotations__, e.g. generic types, wildcard types (with or without bounds, e.g. List<? extends T> for a complex example), or inner types of arrays.

Of course we could aim at putting some or all of the functionality stubgenj currently does to map these to the python typing system directly into JPype. However, this is probably a good piece of work and may not be straightforward in all cases, in particular if old python versions need to be supported.

The big advantage I see in doing this mapping independently from Java Reflection is that it does not affect runtime behavior of JPype, so it can be a bit more sloppy in some edge cases.

For the generics, the types should inherit from typing.Generic (directly or indirectly), and use typing.TypeVar arguments. You can check how this is done in the stubs e.g. for collections in java.util.
The type info is available in the reflection API via getParameterizedType() in cases where it is not erased. Then there are a few different cases to handle ... see https://gist.github.com/michi42/2110a8615a0e917a13ec6748c6168735#file-stubgenj-py-L308 for my attempt to deal with that (which may also not be correct in some edge cases)

@michi42
Copy link
Contributor Author

michi42 commented Feb 23, 2021

A little update - together with @pelson we've further improved the stubgenj and turned it into a standalone installable package (for the time being): https://gitlab.cern.ch/scripting-tools/stubgenj

I'm also working on a little test suite for this.

@Christopher-Chianelli
Copy link
Contributor

@michi42 How can I open a PR to stubgenj? I have a patch that add class Javadoc via from jpype._jclass import _jclassDoc. Method Javadoc from from jpype._jmethod import _jmethodGetDoc is not as useful (it only list overloads it seems).

@michi42
Copy link
Contributor Author

michi42 commented Oct 28, 2021

@Christopher-Chianelli Good question - I did not consider this when I created the repository on the CERN GitLab. Indeed write access is restricted to CERN account holders.

To get this out as quickly as possible, could you just send me a patch for now?
In parallel I will discuss with some colleagues if we can move it to a public GitHub repo.

Also, I just pushed the latest version to pypi for your convenience: https://pypi.org/project/stubgenj/0.2.2/

@Christopher-Chianelli
Copy link
Contributor

@michi42 patch is attached
add-javadoc.txt

@michi42
Copy link
Contributor Author

michi42 commented Oct 28, 2021

@Christopher-Chianelli Thanks a lot for the patch. It looks good to me in general, but I think the specification do not allow adding Javadoc for empty class stubs (at least pycharm reports it as error) - therefore I have removed it.

I have released https://pypi.org/project/stubgenj/0.2.3/ with your changes.

@Christopher-Chianelli
Copy link
Contributor

@michi42 I managed to find a way get method javadoc in the majority of cases. Sometimes method javadoc is missing, which appears to be the case when JPype cannot find the Javadoc for a class (for me, it happens when an interface extends another interface). (Basically, when _jclassDoc is called, it populates __javadoc__ on the class, which holds the javadoc for individual methods, fields and constructors in a dictionary. For overloaded methods, they are merged, but the first line is the "signature" of the overload, which can be used to put it in the corresponding overload method.) Patch attache
javadoc-method-patch.txt
d.

@michi42
Copy link
Contributor Author

michi42 commented Oct 31, 2021

@Christopher-Chianelli Thanks for the patch, I tried to tune it a bit - basically in the end I call into org.jpype.javadoc.JavadocExtractor to avoid relying on the side effect of _jclassDoc.
Like this we can also avoid the default generated docstrings of JPype if no JavaDoc is available - when type stubs are available, it does not provide much extra information I think. If you think it's useful, I can make it optional...

Also I had to change your RegExps a little as it was matching too broadly, which lead to the wrong JavaDoc being attached to overloads.
It's not perfect, in particular it won't catch cases with the same number but different types of arguments for the overloads. Still much better than nothing.

Here is the MR, I will let my colleagues have a look ...
https://gitlab.cern.ch/scripting-tools/stubgenj/-/merge_requests/6
I think I will still implement an option to turn it on/off as it slows down the process a bit.

@michi42
Copy link
Contributor Author

michi42 commented Nov 2, 2021

@Christopher-Chianelli I have just released the updated version as 0.2.4: https://pypi.org/project/stubgenj/0.2.4/

Thanks again for the patches!

@Christopher-Chianelli
Copy link
Contributor

@michi42 FYI JPype1 1.2.1(which stubgenj depends on (and since ~= is used, pip will not allow a higher minor version (i.e. JPype1 1.3.0)) does not compile on Python 3.10 on Linux. Which in turns means it can't be listed as a build-system.requires. I just tested using a hack that only install JPype1>=1.3.0 and then does a pip install --no-deps stubgenj in the middle of the build, which works (i.e. stubgenj works with 1.3.0). Can you consider either updating the JPype version or changing it to range (i.e. '>= 1.2.1', or '>=0.2.1,<2.0.0', etc)?

@michi42
Copy link
Contributor Author

michi42 commented Nov 9, 2021

Done: https://pypi.org/project/stubgenj/0.2.5/ - now asking for JPype1>=1.2.1,<2.0.0

This update also includes a fix for a misbehavior that directories in Javadoc JARs on the classpath were seen as "java packages", even if they contained no classes or subpackages in the real class JARs.
Such directories could have funny names (like "class-use" in Guava) that do not make valid python identifiers, and hence, break stubs.

@ldeluigi
Copy link

ldeluigi commented Oct 2, 2023

News? Can't this be merged into JPype directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement in capability planned for future release
Projects
None yet
Development

No branches or pull requests

4 participants