Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea resolve jars from maven repositories #900

Open
Thrameos opened this issue Dec 3, 2020 · 7 comments
Open

Feature idea resolve jars from maven repositories #900

Thrameos opened this issue Dec 3, 2020 · 7 comments

Comments

@Thrameos
Copy link
Contributor

Thrameos commented Dec 3, 2020

One issue that people run when distributing Python packages with jars is that the jar file needs to become part of the Python module and be installed in the site packages. It is possible that we could add an alternative method such as
jpype.ivy.addArtifact("com.h2database:h2:1.4.200") which would automatically download the jar file and all dependencies and insert into the class path loader. It would depend on having Apache ivy already available in the class path.

Here is a prototype of the system.

from org.apache.ivy import Ivy
from org.apache.ivy.core import LogOptions
from org.apache.ivy.core.module.id import ModuleRevisionId
from org.apache.ivy.core.resolve import ResolveOptions
from org.apache.ivy.core.retrieve import RetrieveOptions
ivy = Ivy.newInstance()
ivy.configureDefault()
ro = ResolveOptions()
ro.setLog(LogOptions.LOG_QUIET)
ro.setConfs(["master"])
mri = ModuleRevisionId.newInstance("com.h2database","h2","1.4.200")
rr = ivy.resolve(mri, ro, True)
md = rr.getModuleDescriptor()
mRID = md.getModuleRevisionId()
destFolder = tempfile.gettempdir()
pattern = destFolder + "/[artifact]-[revision](-[classifier]).[ext]";
ro = RetrieveOptions()
ro.setConfs(["master"])
ro.setDestArtifactPattern(pattern);
ro.setLog(LogOptions.LOG_QUIET)
if ro.getDestArtifactPattern() is None:
    raise RuntimeError()
rr = ivy.retrieve(mRID, ro)
for file in rr.getRetrievedFiles():
    print(file.toPath())
    jpype.addClassPath(str(file.toPath().toString()))
jpype.JClass("org.h2.Driver")  # works now 

I know that scyjava uses a similar system for JGO. Is this a feature which people likely use? Does it belong in JPype?
@ctrueden any thoughts on this?

@ctrueden
Copy link

ctrueden commented Dec 3, 2020

Are you familiar with Groovy Grape? A similar feature available from CPython via JPype would be amazing. Grape works by annotating import statements with which GAV it's from, and optionally which Maven repository, and Grape takes care of the rest: downloading the artifact if needed, and then loading the class with a dynamic class loader. Similarly, BeakerX has %classpath magic making it super easy to depend on classes from remote artifacts. I believe Grape is built on Ivy and BeakerX's %classpath is built on Grape. So it's all the same thing, really. But in the spirit of JPype's mission to make things as convenient and elegant as possible, a similar syntax would be fantastic.

Regarding jgo: it works by invoking mvn as a separate process, so that adds a Maven dependency. Your approach above to use Ivy is lighter weight. I know much less about Ivy than I do about Maven, though: can Ivy handle Maven bills of materials (BOMs), to keep versions synchronized across a set of related components? My community certainly needs that, or else it's NoSuchMethodError etc. all over the place.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 3, 2020

It is very light relative to grovy or maven as it is a dependency manager and not a build system. I selected it for my projects internal use mainly because it is light, doesn't take over the build system, and can be used from the command line with minimal setup. (in other words perfect for a Python project that just uses Java packages occasionally.)

For JPype it requires two files to use ivy:
https://github.com/jpype-project/jpype/blob/master/ivy.xml - a list of the dependencies we want to pull.
https://github.com/jpype-project/jpype/blob/master/resolve.sh - a script which calls ivy with the "pattern" to retrieve.

I am not sure that I can exactly match the syntax in Python as annotations do not apply to Python import statements. Though I likely can make it simply act as functions that are called before the import statement.

The API would likely be something like (stealing from the Grab example)

# We are going to need a special package as this should not be present unless requested.
import jpype.depends as JDep

# Pass options to Ivy in the form of function calls in the jpype.depends
JDep.resolver(name='restlet', root='http://maven.restlet.org/')
JDep.dependency(group='org.restlet', module='org.restlet', version='1.1.6')

# Then use the import for the retrieved jar files
import org.springframework.jdbc.core.JdbcTemplate

Ivy is the dependency manager for ant and supports pulling files including all dependencies specified in the pom files. It is able to do things like publishing and other similar features, but for our purposes we are interested only in "resolve" and "retrieve".

The set up is pretty easy. You first have to configure it with settings, though I believe that the defaults are likely pretty close to usable. Then you construct a resource using the usual notation. The weird issue is classifiers are not an ivy concept natively but you can add it in the process (at least I know how to do it in the files; need to find it programmatically). Then you set up the resolve options. Unfortunately I am less familiar with this part as there are a lot of options that not being a maven user, some don't make much sense. You then call resolve which scans the POM files and decides what you are going to pull.

There is the somewhat weird concept of configurations. My config files usually just push the jar file request into the default and pull it. But there are some set of default configurations that it supports. Assuming we just want jars then master seems like the right one.

        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   1   |   0   |   0   ||   1   |   0   |
        |      master      |   1   |   1   |   0   |   0   ||   1   |   0   |
        |      compile     |   1   |   1   |   0   |   0   ||   0   |   0   |
        |     provided     |   1   |   1   |   0   |   0   ||   0   |   0   |
        |      runtime     |   1   |   1   |   0   |   0   ||   0   |   0   |
        |       test       |   1   |   1   |   0   |   0   ||   0   |   0   |
        |      system      |   1   |   1   |   0   |   0   ||   0   |   0   |
        |      sources     |   1   |   1   |   0   |   0   ||   1   |   0   |
        |      javadoc     |   1   |   1   |   0   |   0   ||   1   |   0   |
        |     optional     |   1   |   1   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------

The resolve pulls the parsed pom files into home/.ivy2/cache. They are big xml files (well this came from ant so that is to be expected.) It generates a report object that you can scan for errors to see if something was missing in the request.

The next stage is retrieve. Same pattern as before. Use the result from resolve to pass the requested artifacts into the retrieve options. You then set options arguments and most importantly a pattern. Ivy requires you to define how the pulled resources are supposed to look on disk like [vendor]/[artifact]-[group](-[classifier]).[ext] as well as specifying where you are going to put the files. I don't really care about the pattern so long as there are no conflicts.
We can send them to a temp directory or under the users home directory. This again generates another report. I then scan the report to get a list of all the jars that it found/copied/downloaded and add them all to the DynamicClassLoader. Then once you call import if all goes well everything gets linked and the required package appears as expected with all of its dependencies.

One issue is ivy is rather noisy. I tried to turn the logging down to minimal but it still seems to print out a few random log messages.

Relevant links:

https://en.wikipedia.org/wiki/Apache_Ivy
https://cwiki.apache.org/confluence/display/IVY/Programmatic+use+of+Ivy
https://github.com/apache/ant-ivy/blob/master/src/java/org/apache/ivy/Ivy.java

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 3, 2020

@ctrueden so I guess I will tag you as someone interested in seeing this feature added.

@marscher This is likely one of those features that it actually makes sense to include in a separate package like jpype-ivy or jpype-deps. The package will be self contained and runs on all architectures. Further, including a 1M jar file for a special feature seems like a bit much. We could incorporate only a portion of ivy into the project, but it includes crypto so I think that would be a poor choice.

@marscher
Copy link
Member

marscher commented Dec 7, 2020

This feature should definitely be a separate package. Why would you like include only a subset of ivy or do you mean that the python wrapper only binds to a subset? Does it include cryptography forbidden in the US, or why do you consider it a bad choice then?

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 7, 2020

We will only be using a subset of ivy for pulling. But i suspect it would be the majority. We arent going to support pushing or making new packages from Python. So there would be no benefit to including the ivy source and striping it down vs including the whole jar.

If it was included in Jpype rather than a separate package it would force the crypto warning on JPype which would be way too much paperwork on my side. Better to make it a new package and just include the ivy jar as is, just pointing to there crypto warning. Their crypto message says it is exempt but not sure why they require a disclaimer in the first place unless it was on the export controlled list from US.

The ASM library on the other hand is a good candidate to include as it has no incomberences, and many packages like jacoco require different versions so we are best to include it and rename it internally to prevent conflicts.

@pelson
Copy link
Contributor

pelson commented Feb 3, 2021

You might be interested in a package I maintain (inherited) which speaks to a bespoke Java build system (built on top of Gradle): https://gitlab.cern.ch/scripting-tools/cmmnbuild-dep-manager. Of course, the implementation there isn't something that is something that can picked up directly (not least because there are some significant improvements needed to the library), but there are some interesting synergies and/or lessons that we could draw on.

One of the first things I would say is: avoid import-time behaviour as much as possible. It is an anti-pattern in Python. I opened a discussion at #933 which was motivated by this point (I didn't want to muddy the waters here).

Downloading JARs at import time is a good example of the kind of import-time behaviour we should avoid. In maintaining the package aforementioned I've also found that it is highly brittle and not a tenable approach for operational code. Gradually I've been building out tools to try to move the JAR downloading forwards as much as possible - in my case I have developer tools which allow virtual environment installation where both pip packages and JARs are downloaded at the same time (i.e. at "install time"). In order to do this effectively I recommend having package metadata to declare Java dependencies, not metadata declared in code (which is only available at runtime) - this mirrors the idea of declaring Python dependencies which we do in install_requires of setuptools.setup. For non packages (i.e. scripts), in the Python world you manually install packages with pip install .... I propose you should do the same for Java packages (python -m jpype_ivy install <coordinate>).

For the record, it is no longer viable to assume that a setup.py is going to be run in the destination environment, so you can't rely on a hook to automatically install JARs at "install-time" this way. You are forced, categorically, into a 2-step phase to pip install stuff, and then to install the JARs. I would call it reasonable "import-time" behaviour to validate that the JARs have been installed, and to raise an ImportError telling the user how to get them (the command above) if they don't exist.

I prototyped enabling custom metadata in the setuptools.setup call. It was quite an effective approach, and would allow something like java_requires=[<coordinate 1>, <coordinate 2>, ...],. Docs for that at https://setuptools.readthedocs.io/en/latest/userguide/extension.html#creating-distutils-extensions.

Another challenging aspect of runtime resolution of JARs is deciding where to put the JARs you download. There is no guarantee that a user has write permission to an environment, so you end up having to make some compromises down the line. If you do this at install-time then you know that the person doing the install is the owner of the environment, and therefore has write permission to store the JARs appropriately. It may sound trivial, but this specific issue has been a real challenge for the package I maintain, and has been an endless cause of runtime issues - the compromise that was made in that library was to use user site-packages which has the terrible effect of being put on the path of all Pythons (even the ones you thought were well isolated in a virtual environment).

@Thrameos
Copy link
Contributor Author

Thrameos commented Feb 3, 2021

Well I haven't done much on this beyond the prototype. This behavior is part of scyjava so at least some usage. I personally don't have much use for this type of system as as you point out it is very brittle for production code. (can you get to the maven repo, do you have write privileges, is there already a version on system?)

That said it would be very nice to have a way to automatically install jars using pip for a JPype using project. This ivy pattern may or may not be part of that solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants