-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jpython #1176
base: master
Are you sure you want to change the base?
Jpython #1176
Conversation
from ._core import getDefaultJVMPath | ||
import os.path | ||
|
||
_JPypeJarPath = "file://" + os.path.join(os.path.dirname(os.path.dirname(__file__)), "org.jpype.jar") |
Check notice
Code scanning / CodeQL
Unused global variable Note
import os.path | ||
|
||
_JPypeJarPath = "file://" + os.path.join(os.path.dirname(os.path.dirname(__file__)), "org.jpype.jar") | ||
_JPypeJVMPath = getDefaultJVMPath() |
Check notice
Code scanning / CodeQL
Unused global variable Note
} | ||
if (argv[i][1] == '-') | ||
{ | ||
i++; |
Check notice
Code scanning / CodeQL
For loop variable changed in body Note
loop
} | ||
if (argv[i][1] == '-') | ||
{ | ||
i++; |
Check notice
Code scanning / CodeQL
For loop variable changed in body Note
loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question first: Is this the same as epypj
? Is it intended that one should be able to start the Python interpreter from within Java?
From a strategic perspective, I personally have always disliked custom executables for things that embed Python or Java (I think of the spark executable) - whilst it is convenient for people to avoid manually doing setup correctly, it introduces another layer of complexity and indirection to use with other tools (e.g. can you use spark with jpython...). It is for this reason I ask about starting Python from within Java directly - it is far more appealing to me that we simply use python
and java
executables directly.
From an implementation perspective, there are clearly still quite a number of things to do (tidying up the implementation [lots of prints still lying around], introducing a test suite, writing the docs, getting the build environment working, etc., etc.). Happy to help with whatever I can on the setupext - it isn't clear what the "installation issues" are just yet though.
@@ -216,6 +214,7 @@ void JPContext::initializeResources(JNIEnv* env, bool interrupt) | |||
|
|||
if (!m_Embedded) | |||
{ | |||
printf("not embedded\n"); fflush(stdout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is still a work in progress. So still a lot of rough edges. I have yet to come up with a logical reason that jpype needs to be started on main thread rather than on the spawned python thread. And i need to resolve how to prevent the user from calling shutdown.
That is unfortunately one of the key architectural problems with jpype. Shutdown should never have been exposed in the first place. The entry point should have spawned a thread, the original thread should have attached, then the launch thread should call destroy jvm. Modules in Python don't generally unload themselves or have time bombs to render themselves unusable, but that is what shutdown does.
Unfortunately like the calling classpath from startJVM as "-Dclasspath" which should never be done(it prevents our jar from being loaded requiring sideloading with a URLClassLoader), the shutdownJVM is in all the old docs and thus ilI will forever be forced to deal with.
|
||
// Set up the GC | ||
PyJPModule_installGC(privateModule.get()); | ||
PyObject_SetAttrString(builtins.get(), "jpype", publicModule.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objective is to add jpype to the built-ins? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Like jython the goal was to have enough symbols available. Though i agree it is completely optional and i was just testing if ther was a pattern for adding to builtins.
@@ -27,7 +27,20 @@ class Develop(develop_cmd): | |||
('enable-coverage', None, 'Instrument c++ code for code coverage measuring'), | |||
] | |||
|
|||
#<<<<<<< HEAD:setupext/dist.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I guess i missed testing that one. There were a lot of conflicts when the setup got changed.
No this is not the same as epypj. With epypj the goal was to build a java library in which python was accessable as java code. That would mean you could use matplotlib directly from java or spawn a python subshell. It required that we call asm to convert python libraries into subs that java a compiled language could link against. The reason it couldn't move forward was to test the thousands of potential entry points from java calling python required some serious user community help as my laboratory would not fund it, and declined to authorize me to work on it personally by declining to sign the python contributor agreement. Jpython is simpler. It is simply a binary in which the order of python and java startup is defined. Both virtual machines start at the same time, have the same lifetime and hopefully end at the same time. The direction of the bridge is the same from python to java. It can do some nicer things like get apple gui thread going. The point of this is to remove the need for JForward and other ackwardness in which one machine is running and the other is not. I could do the same with a jpython module style like say "pip" but the entry point if java without a real executable requires starting python then calling spawn which starts a second copy of python with the jvm going. Epype could do this same task by simply having a simple main in the jar calling java and launching the python main as one statement. But without epypj this is the best I can do. Java cant call python currently in a meaningful way, and fighting the conflicts in packaging systems is posing serious problems. We are losing users because osx released incompatible java and python versions. Which given i can't access such a messed up machine (mixed architectures and mixed security flags) is a serious drag to this project as users think it is the duty of jpype to somehow fix a major companies deployment problem. |
The main installation issue is placing the binary in the python install location (ie like executable python modules) such that it is treated no different than others. IE follows a virtual env install etc. At some point disttools in python was actually intended to support binaries. I found the linker stubs with EXECUTABLE flags meaning it was a planned use case. But the flag is almost inaccessable and the new system is getting farther and farther from allowing general purpose code. Thus it it is not at all clear how to tell the manifest that one piece belongs in the bin directory. |
The other fundemental problem we have is packaging in general, though not the subject of this PR. The only system for getting python and java together is conda. And it isn't clear that is working on OSX. It should be possible for any java and python to work with jpype. (And if python had just a little more support we could build a python version agnostic support library, but that is hung up on me writting a pep). But that is failing in practice. As for the goals of this PR the idea is to get rid of the need to start the JVM or control its lifespan. Many users requested it in the past. We have the current and on going problem that starting the JVM is a one shot thing, and as modules using Java as a backend all start the JVM they are all fundimentally incompatible. Though as I only use JPype with code I have written (being mainly a Java user) I can only take shots in the dark as to how to solve it. |
Thanks for all the details. From my perspective, as I already hinted at in my last comment, I'm quite strongly against the idea of a new JPype Python executable ( To clarify that (strong) view a bit, imagine we wanted to use a Jupyter notebook, and use some JPype stuff... do I need to make a JPype One major concern from my perspective is that the normal JPype workflow would become a second citizen to From a technical perspective, I don't know how I would go about packaging such an executable. Perhaps I would make my life simple and simply ship a bash script which launches the |
Let’s be clear what this binary is for. If an application is launching JavaFX or swing, or your module would deadlock on OSX, and needs to be able to guarantee that the JVM and the Python lifespans are completely tied together, then jpython is the correct choice. If someone wants to build more stuff around that environment that is their choice, but certainly not one I make.
Packages such as that are very hard for me to support currently. We provided a work around that relaunches the Python main and starts the Apphelper. But no one uses it because it is very clunky to change the Python main thread during operation, and I can’t direct people on how to use it because I don’t have access to any machines that would use it. It is doubly a kludge because most people want code that works the same on many machines and not just OSX. While I can call a machine to build a job, I can’t interact with one and thus I can’t actually fix those graphical deadlock issues. The psuedobinary is the next best in which we provide an environment that works for those modules. (assuming someone can get the right threading pattern)
I personally don’t have much use for jpython as JPype started once in my application is all that I need. Thus, there is not much of a chance that the standard entry point would ever be second class. I am not going back and changing my thousands of simulation scripts to require something else.
JForward is however another issue which is trying to work around those modules that don’t follow standard order or simply require the JNI be up to operate. At best JForward is a work around because there are only certain actions. But there are huge problems with JForward:
* You can’t check what the forward is for (java object, java class, java enum, java int, etc) so you blindly plow forward far from the location where the problem occurred. The reporting problem may be solvable if you capture the state of Python at the time it was called then direct the user back to the error. Further, it introduces the problems that you get with mock. This is catastrophic if type of the object doesn’t match as the polymorph won’t work. This lack of safe operation really soured me when I was working on it. In theory, one could resolved this by using a Python based class loader to read the symbols out of jar classes directly rather than while the JVM was live. But that means having a jar reader, module reader, class reader, and symbol decode written in C or Python available. Were this pre Java 9 where it was just the jar/class, it may be doable, but modules are another beast.
* It requires mutilation of the Python type tables to upgrade the object. This one is serious unless one knows the type of the symbol to forward. I would like to get to the point that JPype is simply using the common API and messing with the internals. As you have already seen in the past my time is tied to gov’t funding which in turn is subject to some very unreliable characters doing things like passing budgets. The deeper that we muck in the Python internals the less likely JPype will ever work for pypy or other python implementations. And it will break when Python releases new versions. If we had other maintainers that understood the inner working well enough to cover for my periods of absence such that JPype could follow new Python versions it may be okay, but that has not developed.
* It may be unstainable. Polymorphing objects is black magic now that Python has changes the object layout to place stuff before and after the python object head. Only having “exactly” the same layout will work going forward. Thus, even if I got it working with today’s Python, it is likely to break 3 versions from now and may never be repairable. This is a horrible outcome. Unless I got some assurance that polymorph is part of Python design (and there is a supported CPython API call to do it safely), I don’t see how I can support it.
* Python is actively working to hide the internals of their objects preventing the mutilation or polymorphism that JPype needs to support this. Thus, the situation is getting worse not better.
So, what are the alternatives if someone is using two modules that are using JPype that both want to start the JVM.
* Support JForward for a period and then abandon it when Python pulls the rug out from us.
* Try to breakup starting like pyjnius where you import in a config module, set it up, then startJVM automatically on “import jpype”. This is a violation of Python policies (import isn’t supposed to do actions) and it fails to address the issue as when module 2 which also uses JPype starts then it can’t push in its classpath. Given that would break all my scripts it is a non-starter.
* Supply a pseudo-python binary that forces the classpath issue back onto the user. Now the modules need to have a line to skip startJVM() if the JVM is already running and JVM flags are defined at start of process (meaning it doesn’t matter what order)
* Supply an exec trick .py script in which starting the JVM is done by launching a new Python. (works with standard python binary scripts but now you launch 2 copies of Python with twice as much chance of things going wrong.)
* Go the route of epypj in which Java launches Python. Same effect as the pseudopython but adds the additional hurtle that now you are dealing with Java distribution system and not the Python one. Doable, but notable worse in some fronts.
* Go Graalvm in which both Python and Java live in the same virtual machine. If it worked well, I would go this route, because there are so many intractable problems with memory management with two garbage collectors. But unless this becomes the Python main it will always be second class.
* Join Jython and make an integrated Python/Java version.
None of these are palatable. But given the options, which one seems most supportable?
My ordered preference would be
1. Epypj – because it allows calling Python and creating Python subshells from within Java. Being able to call matplotlib as Java code would make porting python code to Java easier. (Benefits me directly)
2. Jpython – because if it solves those problems for the users, it means less issue reports that I can’t solve. (Benefits me indirectly)
3. JForward – distant third because it means more problems and more maintenance. I can work to get it going, but I will need someone willing to put in the time to solve the “what object am I forwarding issue.”
As far as Jupyter notebooks go, I believe that you “can” specify jpython to use it, but that does not mean should. That is no different than just placing “import jpype; jpype.startJVM()” as the top line of the script for that application. As Jupyter doesn’t require the threading control and isn’t going to let you spawn javafx, why would someone do that? I would recommend against that in the documentation. I understand your fear that jpython would mean abandoning JPype as a module, but that is not my intent. As for all those others that embed Python, the JPype module option (assuming it works there) would be the preferred option.
Having the jpython would mean someone would write a module that is incompatible with an untied Python/Java machine, and that would be incompatible with embedded machines. But that isn’t much different than the current case in which almost every JPype using module is incompatible with every other JPype using module. Most modules that use it are “heavy wrappers” that hide the JVM entirely.
Mind you I would very much like a real Python/Java hybrid plugin VM system rather than completely process tied like we have now. If I could split the machines in such as way that they were two processes and still had shared memory access with the same interface as JPype I would do so in a heartbeat. In such a system, JPype is a python module creates a JVM object which represented another process which talks to the JVM via JNI. JNI is a galactic pain in the a** and there are many times that I would rather be able to spawn sub jvms for a task (which is another way to have two modules use jvm as they would each have their own JVM copy). Unfortunately, this option is even less portable. For threading each attachJVM would need to spawn a server thread in the other process to server requests. Further, I have to directly call memory paging system to make it so that we can have shared pages between the processes that virtually map over those that JNI give me (or be forced to copy the entire page between). It was sufficiently sketchy that I abandoned that project after a few weeks. It appears that given JNI didn’t supply me with the “shadow this array here” option that I would have to rewrite portions of JavaVM to make it work to reasonable performance levels. Further, it does nothing to solve the reference loop problem.
|
This is a PR which is intended to create a jpython executable that handles all the control over lifespan of Python and Java.
Highlights:
Problems:
I am going to need assistance from @pelson on the install issues and other for the AppHandler logic. I need assistance from Python developers on how the operate Py_RunMain without it bypassing the proper shutdown order and finalizing before the application completes.