Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forpy / python-script crashes when python multiprocesing is used #32

Closed
pjoeckel opened this issue Apr 22, 2021 · 18 comments
Closed

forpy / python-script crashes when python multiprocesing is used #32

pjoeckel opened this issue Apr 22, 2021 · 18 comments

Comments

@pjoeckel
Copy link

In #10 it has been mentioned

Regarding parallelism:
Multiprocessing (e.g. with MPI) should not be a problem as each process can have its own Python interpreter.

I have an application where the python program uses multiprocessing:

from multiprocessing import Pool, cpu_count
...
       n = min(ipar, cpu_count(), len(obj_list))
        with Pool(n) as pool:
            func = partial(prepare_map_plot, dict=dict,
                           gconf=gconf, fconf=fconf)
            pool.map(func, obj_list)

Note that I call (from fortran) forpy_initialize only once as suggested in #10.

Running the fortran program with forpy calling the python script terminates without any error message.
The same python script runs, if I use it off-line (thus not principle error in the python script).
Not my question is: Should that in principle work (and the issue might be related with my python installation or Linux Cluster)
and could somebody provide me a simple example (test case) involving a short fortran program and a corresponding
python script with multiprocessing? (Note: numpy needs to be included/used).

@ylikx
Copy link
Owner

ylikx commented Apr 22, 2021

I have not tried to embed a Python program that uses the multiprocessing module, but maybe the problem is related to this:
https://stackoverflow.com/questions/15636266/embedded-python-multiprocessing-not-working

FYI: As mentioned in the link above, in embedded Python sys.argv does not exists. Since this leads to problems with some third party modules, forpy sets sys.argv = [""]

@pjoeckel
Copy link
Author

Thank you very much for your help (and thank you very much for sharing your great code! It is very useful!).
Unfortunately the hints behind the URL above do not help. In the meantime, I wrote a very small example fortran program with a very small python script ... and this works exactly as it should, even with multiprocessing.
Thus I cannot reproduce the problem with a simple example! My complex application, however, does not work, and the biggest issue is that it does not even print out error messages that could give hints about the problem ...

@pjoeckel
Copy link
Author

In the meantime I figured out two important pieces of information. In my simple example I emulated with time.sleep that the different sub-processes need different times. With that I could reproduce the problem from the complex application.

  1. Obviously the context manager of multiprocessing doe not work correctly in embedded python, i.e. the construction
        with Pool(n) as pool:
            pool.map(func, obj_list)

cannot be used. It leads to the crash.
It needs to be replaced by an explicit start and stop of the processes:

import multiprocessing as mp
...
pool = mp.Pool(n)
pool.imap_unordered(func, obj_list)
pool.close()
pool.join()

(Note that I also used imap_unordered instead of map, since my processes are all independent, but it should also work with map).

  1. Next, the construction of func with functool.partial does not work in connection with multiprocessing in embedded python3!
    It is simply doing nothing. I had to replace it in a similar way as described here:
    http://web.archive.org/web/20150915201931/http://techguyinmidtown.com/2009/01/23/hack-for-functoolspartial-and-multiprocessing/

I.e.,

import itertools as it
...
pool = mp.Pool(n)
        pool.imap_unordered(prepare_map_plot_wrapper,
                            zip(obj_list,it.repeat(dict),it.repeat(gconf),
                                it.repeat(fconf)),
                            2)
        pool.close()
        pool.join()
...

def prepare_map_plot_wrapper(x):
    """
    wrapper as workaround to allow multiprocessing with multiple arguments
    Note: functools.partial does NOT work in embedded python3
    """
    prepare_map_plot(x[0], x[1], x[2], x[3])
...

I hope this information is helpful for others as well.

IMPORTANT: forpy is NOT to blame in any way! It works perfectly. Only python3 obviously has some very subtle and counter.intuitive flaws ... issues that are nowhere documented.
My simple example is running with these workarounds, next I have to test the more complex application ...

@pjoeckel
Copy link
Author

Bad news: Although the simple example is working, the complex application is not.
As the documentation of multiprocessing says, starting a pool of processes under Linux/Unix will fork the entire process.
The question I could not answer so far: does that imply that the entire application is forked (in the embedded python case) or only the embedded python interpreter? In the former case, it would not be a surprise that this won't work with my application (but only with the very simple example). Moreover, it would be the next question on how to fork only the python interpreter?
I tried already to change the starting method to "spawn", but this ends up in a bunch off errors, even in the simple example.

Any comments / help is appreciated.

@ylikx
Copy link
Owner

ylikx commented Apr 26, 2021

I would interpret that statement from the documentation such that the process corresponding to your Fortran program is forked.
Maybe a workaround with launching the multiprocessing Python code in a separate process with a full python interpreter is possible. The separate process could be launched e.g. using the subprocess module.

Did you try to specify the python interpreter executable using multiprocessing.set_executable?

Thank you for sharing your findings!

@pjoeckel
Copy link
Author

Indeed, I also played with setting the executable, but then it crashes.
As soon as I select start method "spawn" or "forkserver", a simple script crashes if called from the forpy backend (although it works if called from python directly). This is what I have:

Fortan code fpytest.f90

program fpy_test
  use forpy_mod
  implicit none

  integer         :: ierror, status
  type(module_py) :: pymod 
  type(list)      :: paths
  type(object)    :: status_obj
  type(tuple)     :: args
  type(dict)      :: d
  integer         :: n

  ierror = forpy_initialize()
  write(*,*) 1, ierror

  ! Instead of setting the environment variable PYTHONPATH,
  ! we can add the current directory "." to sys.path
  ierror = get_sys_path(paths)
  write(*,*) 2, ierror

  ierror = paths%append(".")
  write(*,*) 3, ierror
  
  ierror = import_py(pymod, 'mini')
  write(*,*) 4, ierror

  ierror = dict_create(d) ! Python: d = {}
  write(*,*) 5, ierror

  !write(*,*) 'How many calls of the workers?:'
  !read(*,*) n
  !write(*,*) n
  n = 5

  ierror = d%setitem("number", n)
  write(*,*) 6, ierror

  ierror = tuple_create(args, 1)  
  write(*,*) 7, ierror

  ierror = args%setitem(0, d)
  write(*,*) 8, ierror

  ierror = call_py(status_obj, pymod, 'fpy_main', args)
  write(*,*) 9, ierror    

  ierror = cast(status, status_obj)
  write(*,*) 10, ierror, status

  CALL d%destroy
  CALL args%destroy
  CALL pymod%destroy
  CALL paths%destroy
  CALL status_obj%destroy

  call forpy_finalize

end program

compiled with:

gfortran -g -fbacktrace -cpp -D__linux__ -fno-second-underscore -ffree-line-length-none -fno-range-check -O3 -c forpy_mod.F90

gfortran -o fpytest -g -fbacktrace -cpp -D__linux__ -fno-second-underscore -ffree-line-length-none -fno-range-check -O3 fpytest.f90 forpy_mod.o `python3-config --ldflags`

The pyhton script mini.py called from the fortran program:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys
import os
import time
import multiprocessing as mp
from collections import Counter
import itertools as it

def fpy_main(d):
    #
    n = d["number"]
    print(f'Hello, I am the main process with PID {mp.current_process().pid} ...')
    print(f'... I will start {n} workers now ...')
    #
    p = {"luggage": 42}
    i = range(n)
    t = [str(j) for j in i]
    #
    ctx = mp.get_context('fork')        ### OK
    #ctx = mp.get_context('forkserver')   ### ERRORS
    #ctx = mp.get_context('spawn')       ### ERRORS
    print(f'... {os.path.join(sys.exec_prefix, "python3")}')
    #ctx.set_executable(os.path.join(sys.exec_prefix, 'python3'))  ### ERROR 1
    #
    pool = ctx.Pool(3)
    result = list(pool.imap_unordered(worker_wrapper, 
                                      zip(i, t, 
                                          it.repeat(p)), 2))
    print(f'{result}')
    pool.close()
    pool.join()
    ###
    print(f'PROCESS IDs: {Counter(result)}')
    print(f'... main process ends here.')
    return 0

def worker_wrapper(x):
    status = worker(x[0], x[1], x[2])
    return os.getpid()

def worker(i, name, d):
    print(f'Hello, I am the worker with PID {mp.current_process().pid} ...')
    print(f'... I am working on task {name} now ...')
    print(f'... the executable I am in is {sys.executable} ...')
    print(f'... it was called with arguments {sys.argv} ...')
    print(f'... and I am in script {__file__} ...')
    print(f'... I am going to sleep now for {i} seconds ...')
    print(f'... I have this in my luggage: {d}')
    time.sleep(int(i))
    print(f'--- woke up again ({name})')
    return 0

def main():
    d = {"number": 5}
    fpy_main(d)
    return 0

if __name__ == "__main__":
    main()

works for all cases slected from lines 21, 22 or 23 if called direcly (./mini.py) but crashes with weird errors as soon as I select "spawn" or "forkserver". I guess I am still doing something wrong ...

@pjoeckel
Copy link
Author

Maybe a workaround with launching the multiprocessing Python code in a separate process with a full python interpreter is possible. The separate process could be launched e.g. using the subprocess module.

I had a similar idea. But the point is that the fortran code via forpy transfers an entire (nested) dictionary (incl. numpy arrays) to the forpy python interpreter. Thus, the question arises, how I can transfer this information to the python interpeter started with subprocess?

@ylikx
Copy link
Owner

ylikx commented Apr 26, 2021

I got curious and ran your example on (a bit outdated) Ubuntu 16.04, python 3.5, gfortran 5.4.0 and it worked with all the starting methods. (I had to replace the f-strings to run on python 3.5) What errors did you get exactly?

@pjoeckel
Copy link
Author

With the forkserver option I get

           1           2
           2           0
           3           0
           4           0
           5           0
           6           0
           7           0
           8           0
Hello, I am the main process with PID 821086 ...
... I will start 3 workers for 5 tasks now ...
           9          -1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f0ac7d2c3ff in ???
        at /usr/src/debug/glibc-2.17-c758a686/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1  0x418b2e in check_tp_flags
        at /storage/f_mecofc/joeckel_test/forpy_mod.F90:2649
#2  0x418b2e in __forpy_mod_MOD_is_long
        at /storage/f_mecofc/joeckel_test/forpy_mod.F90:2443
#3  0x418b2e in __forpy_mod_MOD_is_int
        at /storage/f_mecofc/joeckel_test/forpy_mod.F90:2577
#4  0x418b2e in __forpy_mod_MOD_cast_to_int32
        at /storage/f_mecofc/joeckel_test/forpy_mod.F90:11140
#5  0x40cd27 in fpy_test
        at /storage/f_mecofc/joeckel_test/fpytest.f90:47
#6  0x40c76c in main
        at /storage/f_mecofc/joeckel_test/fpytest.f90:2
Segmentation fault
[login6]/storage/f_mecofc/joeckel_test> /sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 4 leaked semaphores to clean up at shutdown
  len(cache))

With the spawn option I get:

           1           2
           2           0
           3           0
           4           0
           5           0
           6           0
           7           0
           8           0
Hello, I am the main process with PID 821595 ...
... I will start 3 workers for 5 tasks now ...
Traceback (most recent call last):
  File "", line 1, in 
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/queues.py", line 24, in 
    from . import connection
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/connection.py", line 18, in 
    import tempfile
  File "/usr/lib64/python3.6/tempfile.py", line 45, in 
    from random import Random as _Random
  File "/usr/lib64/python3.6/random.py", line 46, in 
    from hashlib import sha512 as _sha512
  File "/usr/lib64/python3.6/hashlib.py", line 204, in 
    from _hashlib import pbkdf2_hmac
ModuleNotFoundError: No module named '_hashlib'
Traceback (most recent call last):
  File "", line 1, in 
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/queues.py", line 24, in 
    from . import connection
  File "/sw/DLR/PA/ESM/spack/opt/spack/linux-centos7-zen/gcc-8.2.0/python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/lib/python3.7/multiprocessing/connection.py", line 18, in 
    import tempfile
  File "/usr/lib64/python3.6/tempfile.py", line 45, in 
    from random import Random as _Random
  File "/usr/lib64/python3.6/random.py", line 46, in 
    from hashlib import sha512 as _sha512
  File "/usr/lib64/python3.6/hashlib.py", line 204, in 
    from _hashlib import pbkdf2_hmac
ModuleNotFoundError: No module named '_hashlib'
...

repeating all the time.

I am using Python 3.7.7 on a CentOS Linux VERSION="7 (Core)" with GNU Fortran (GCC) 8.2.0.

@pjoeckel
Copy link
Author

openSUSE Leap 15.0, Python 3.6.9, GNU Fortran (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407]
works well ...

BUT ... in the meantime I figured out that the simple example also works on the system above, if I really compile with gfortran directly.

However, since my real application is MPI-parallel, I actually used the compiler wrappers also linking the MPI library. Nevertheless, I run only 1 MPI task, and the complex application calls a python interpreter via forpy on ONLY one task.
Still this seems to make the difference ... but why?

@pjoeckel
Copy link
Author

Intermediate result: If I compile forpy_mod.F90 with gfortran (no MPI wrapper) it works, even if I compile the fortran program with mpif90 ...

I would like to understand why this makes a difference (I mean what does the wrapper do in addition to linking some libraries, which is irrelevant (at least I thought so far!) for compiling the object forpy_mod.o, isn't it?

@pjoeckel
Copy link
Author

Next result: I can compile forpy_mod.F90 without the MPI compiler wrapper and still link it to a MPI parallel program. A small example, in which only one MPI task is calling the python interpreter via forpy works well, even with starting multiprocessing in the called python script.

I recompiled my complex application (forpy_mod.F90 without MPI wrapper), re-linked it, and it still crashes without any error message, as soon as the multiprocessing pool is started up ... :-(

@ylikx
Copy link
Owner

ylikx commented Apr 28, 2021

Hi,
you are getting error code 2 with forpy_initialize, that could mean that the numpy module wasn't found. Also since there are import errors for other modules, there seems to be a issue with locating installed modules.

I also think that creating a single module with the mpi wrappers shouldn't be a problem.

I'm not sure if forking a MPI process could be the issue.

@ylikx
Copy link
Owner

ylikx commented Apr 28, 2021

Hello, I am the main process with PID 821086 ...
... I will start 3 workers for 5 tasks now ...
           9          -1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Here you could insert a call err_print statement to see what the error was. The segfault is probably due to continuing without handling the error.

@pjoeckel
Copy link
Author

Thank you very much for the suggestion. As I mentioned above, I could resolve all issue with the small example program and python script, even if compiled with MPI. I could provide the code, if that helps.
Thus, lessons learned so far: forking from within an mpi task is not a principle problem.

However, my complex application still crashes without any error message, thus I do note receive a proper error status (nor could I insert an err_print). I managed, with run-time compiler options, to get a core-fiile which I could analyse with gdb, but this does not help me either, see below. I seems that the python interpreter, in the moment of the forking, tries to access an object which is not associated, but no idea which.

#0  0x00002aaaaaabf38e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#1  0x00002aaaaaaba7d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#2  0x00002aaaaaabeb8b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#3  0x00002aaaac6b07b2 in do_dlopen () from /lib64/libc.so.6
#4  0x00002aaaaaaba7d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#5  0x00002aaaac6b0872 in __libc_dlopen_mode () from /lib64/libc.so.6
#6  0x00002aaaac687a65 in init () from /lib64/libc.so.6
#7  0x00002aaaab91020b in __pthread_once_slow () from /lib64/libpthread.so.0
#8  0x00002aaaac687b7c in backtrace () from /lib64/libc.so.6
#9  0x00002aaab804ec9b in ucs_debug_backtrace_create.part () from /lib64/libucs.so.0
#10 0x00002aaab804f0b5 in ucs_debug_backtrace_create () from /lib64/libucs.so.0
#11 0x00002aaab804f614 in ucs_debug_show_innermost_source_file () from /lib64/libucs.so.0
#12 0x00002aaab80519c8 in ucs_handle_error () from /lib64/libucs.so.0
#13 0x00002aaab8051d4c in ucs_debug_handle_error_signal () from /lib64/libucs.so.0
#14 0x00002aaab8051f02 in ucs_error_signal_handler () from /lib64/libucs.so.0
#15 
#16 0x00002aaaabcb99a8 in PyDict_GetItem (op=op@entry=0x2aaada7b55f0, key=key@entry=0x2aaada729270) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/dictobject.c:1327
#17 0x00002aaaabcc9550 in _PyObject_GenericGetAttrWithDict (obj=obj@entry=0x2aaada7b0cb0, name=name@entry=0x2aaada729270, dict=0x2aaada7b55f0, dict@entry=0x0, suppress=suppress@entry=0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/object.c:1271
#18 0x00002aaaabcc9929 in PyObject_GenericGetAttr (obj=obj@entry=0x2aaada7b0cb0, name=name@entry=0x2aaada729270) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/object.c:1309
#19 0x00002aaaabcc66d5 in module_getattro (m=0x2aaada7b0cb0, name=0x2aaada729270) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/moduleobject.c:704
#20 0x00002aaaabcc924b in _PyObject_GetMethod (obj=obj@entry=0x2aaada7b0cb0, name=0x2aaada729270, method=method@entry=0x7ffffffe99b8)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/object.c:1140
#21 0x00002aaaabc5cb57 in _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3040
#22 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=0x2aaaeb461c90, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x0, kwargs=0x2aaaeb4778e8, kwcount=0, kwstep=1,
    defs=0x2aaaeb3e79a8, defcount=1, kwdefs=0x0, closure=0x0, name=0x2aaaeb3cd430, qualname=0x2aaaeb3c9030) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
#23 0x00002aaaabc86253 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:433
#24 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#25 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#26 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=0x2aaaeb4611e0, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x0, kwargs=0x2aaaeb45e7b0, kwcount=0, kwstep=1,
    defs=0x2aaaeb3e3f28, defcount=1, kwdefs=0x0, closure=0x0, name=0x2aaaeb3cd530, qualname=0x2aaaeb3cb7b0) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
--Type  for more, q to quit, c to continue without paging--
#27 0x00002aaaabc86253 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:433
#28 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#29 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#30 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#31 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#32 0x00002aaaabc6132f in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#33 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3124
#34 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=_co@entry=0x2aaaeb38de40, globals=globals@entry=0x2aaaeb442410, locals=locals@entry=0x0, args=args@entry=0x2aaaeb45f4c8, argcount=argcount@entry=6, kwnames=kwnames@entry=0x0,
    kwargs=0x0, kwcount=0, kwstep=2, defs=0x2aaaeb41b608, defcount=4, kwdefs=0x0, closure=0x0, name=0x2aaaeb3beb30, qualname=0x2aaaeb3beb30)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
#35 0x00002aaaabc85f87 in _PyFunction_FastCallDict (func=0x2aaaeb404710, args=0x2aaaeb45f4c8, nargs=6, kwargs=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:376
#36 0x00002aaaabc5df40 in do_call_core (kwdict=0x2aaaeb3d1a00, callargs=0x2aaaeb45f4b0, func=0x2aaaeb404710) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4645
#37 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3191
#38 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#39 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#40 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#41 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#42 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#43 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#44 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#45 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#46 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=2, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#47 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
--Type  for more, q to quit, c to continue without paging--
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#48 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#49 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#50 0x00002aaaabc59111 in function_code_fastcall (co=co@entry=0x2aaaeb46d9c0, args=, args@entry=0x7ffffffea880, nargs=nargs@entry=2, globals=globals@entry=0x2aaaeb3d1690)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#51 0x00002aaaabc861b4 in _PyFunction_FastCallDict (func=0x2aaaeb4745f0, args=0x7ffffffea880, nargs=2, kwargs=0x0) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:322
#52 0x00002aaaabc872ed in _PyObject_Call_Prepend (callable=callable@entry=0x2aaaeb4745f0, obj=obj@entry=0x2aaaeb3d7b10, args=args@entry=0x2aaaeb3af590, kwargs=kwargs@entry=0x0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:908
#53 0x00002aaaabce0a81 in slot_tp_init (self=0x2aaaeb3d7b10, args=0x2aaaeb3af590, kwds=0x0) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/typeobject.c:6636
#54 0x00002aaaabcdcdf2 in type_call (type=, args=0x2aaaeb3af590, kwds=0x0) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/typeobject.c:971
#55 0x00002aaaabc86942 in _PyObject_FastCallKeywords (callable=0x1596edb0, stack=, nargs=, kwnames=0x0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:199
#56 0x00002aaaabc5c3e8 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4619
#57 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3124
#58 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#59 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#60 0x00002aaaabc63495 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#61 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3093
#62 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#63 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#64 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#65 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#66 0x00002aaaabc59111 in function_code_fastcall (co=, args=, nargs=1, globals=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#67 0x00002aaaabc86387 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:415
#68 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#69 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
--Type  for more, q to quit, c to continue without paging--
#70 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=_co@entry=0x2aaaeb3c80c0, globals=globals@entry=0x2aaaeb442410, locals=locals@entry=0x0, args=args@entry=0x7ffffffeb0c0, argcount=argcount@entry=5,
    kwnames=kwnames@entry=0x2aaaeaf60928, kwargs=0x2aaaeaf60930, kwcount=2, kwstep=2, defs=0x2aaaeb3b0f08, defcount=5, kwdefs=0x0, closure=0x0, name=0x2aaad73324f0, qualname=0x2aaaeb3c30f0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
#71 0x00002aaaabc8608a in _PyFunction_FastCallDict (func=0x2aaaeb404b00, args=0x7ffffffeb0c0, nargs=5, kwargs=0x2aaaeb442190)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:376
#72 0x00002aaaabc872ed in _PyObject_Call_Prepend (callable=callable@entry=0x2aaaeb404b00, obj=obj@entry=0x2aaae1261610, args=args@entry=0x2aaaeb41b410, kwargs=kwargs@entry=0x2aaaeb442190)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:908
#73 0x00002aaaabce0a81 in slot_tp_init (self=0x2aaae1261610, args=0x2aaaeb41b410, kwds=0x2aaaeb442190) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/typeobject.c:6636
#74 0x00002aaaabcdcdf2 in type_call (type=, args=0x2aaaeb41b410, kwds=0x2aaaeb442190) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/typeobject.c:971
#75 0x00002aaaabc86942 in _PyObject_FastCallKeywords (callable=0x159687f0, stack=, nargs=, kwnames=0x2aaae3827910)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:199
#76 0x00002aaaabc5e0ae in call_function (kwnames=0x2aaae3827910, oparg=, pp_stack=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4619
#77 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3139
#78 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=0x2aaae38269c0, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x0, kwargs=0x15901930, kwcount=0, kwstep=1,
    defs=0x2aaae382c128, defcount=4, kwdefs=0x0, closure=0x0, name=0x2aaadb741cf0, qualname=0x2aaae3828d00) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
#79 0x00002aaaabc86253 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:433
#80 0x00002aaaabc616c5 in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#81 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3110
#82 0x00002aaaabc59111 in function_code_fastcall (co=co@entry=0x2aaadb712150, args=, args@entry=0x2aaaeaf608d8, nargs=nargs@entry=3, globals=globals@entry=0x2aaadb296230)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#83 0x00002aaaabc861b4 in _PyFunction_FastCallDict (func=0x2aaaeb38d290, args=0x2aaaeaf608d8, nargs=3, kwargs=0x2aaaeb4422d0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:322
#84 0x00002aaaabc5df40 in do_call_core (kwdict=0x2aaaeb4422d0, callargs=0x2aaaeaf608c0, func=0x2aaaeb38d290) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4645
#85 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3191
#86 0x00002aaaabd60601 in _PyEval_EvalCodeWithName (_co=0x2aaadb3c4d20, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x0, kwargs=0x158feec8, kwcount=0, kwstep=1,
    defs=0x0, defcount=0, kwdefs=0x0, closure=0x2aaaeb2baf10, name=0x2aaad74f9a30, qualname=0x2aaadb711c60) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3930
#87 0x00002aaaabc86253 in _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:433
#88 0x00002aaaabc6132f in call_function (kwnames=0x0, oparg=, pp_stack=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:4616
#89 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Python/ceval.c:3124
--Type  for more, q to quit, c to continue without paging--
#90 0x00002aaaabc59111 in function_code_fastcall (co=co@entry=0x2aaadb715d20, args=, args@entry=0x2aaaeb3871e8, nargs=nargs@entry=1, globals=globals@entry=0x2aaadb296230)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:283
#91 0x00002aaaabc861b4 in _PyFunction_FastCallDict (func=0x2aaaeb38bd40, args=0x2aaaeb3871e8, nargs=1, kwargs=0x2aaaeb3aeaa0)
    at /tmp/joec_pa/spack-stage/spack-stage-python-3.7.7-547jlktnsjm3alxlq2qmsilhvy2lpkjw/spack-src/Objects/call.c:322
#92 0x0000000001e8e715 in forpy_mod::call_py_object (kwargs=..., args=..., obj_to_call=..., return_value=...) at forpy_mod.F90:7135
#93 forpy_mod::call_py_object_only_kwargs (kwargs=..., obj_to_call=..., return_value=...) at forpy_mod.F90:7185
#94 forpy_mod::call_py_attribute (return_value=..., obj=..., attr_name=, args=..., kwargs=,
    _attr_name=_attr_name@entry=8) at forpy_mod.F90:7108
#95 0x00000000019cd13f in messy_main_channel_forpy::fpy_master (channel=..., status=-82816) at messy_main_channel_forpy.f90:327
#96 messy_main_channel_forpy::ch_forpy_finish_io (status=0, iomode=1, channel=0x13885100, lfinal=.FALSE.) at messy_main_channel_forpy.f90:234
#97 0x0000000001820003 in messy_main_channel_io::channel_finish_io (status=0, iomode=1, lclose=.FALSE., pe=0, chname=,
    dom_id=, llp_io=, _chname=0) at messy_main_channel_io.f90:1074
#98 0x000000000099b28a in messy_main_channel_bi::channel_write_output_bi (iomode=1) at ../../messy/bmil/messy_main_channel_bi.f90:1564
#99 0x00000000009a0379 in messy_main_channel_bi::main_channel_write_output () at ../../messy/bmil/messy_main_channel_bi.f90:1188
#100 0x00000000009b2a68 in messy_write_output () at ../../messy/bmil/messy_main_control_echam5.inc:2134
#101 0x00000000010f2264 in stepon () at ../src/stepon.f90:274
#102 0x000000000040f9f1 in control () at ../src/control.f90:321
#103 0x000000000044351e in master () at ../src/master.f90:141
#104 0x000000000040c17d in main (argc=, argv=) at ../src/master.f90:42
#105 0x00002aaaac595555 in __libc_start_main () from /lib64/libc.so.6
#106 0x000000000040c1ad in _start () at ../src/master.f90:42

@pjoeckel
Copy link
Author

Next, I tried a complete new python installation (3.8.8). The outcome so far: same problem. However, if I comment out
all

  CALL object%destroy

from my fortran code, the next phenomenon occurs: in that case, the program gets stuck (deadlock?) in that moment, when the multiprocesses are forked. I see the new processes with `ps uxww´, but they simply do not run (CPU load 0.0).
And all the other MPI tasks remain waiting (however, with CPU load of close to 100%).

Therefore the question arises: What does CALL object%destroy really do? In my case, object is
a tuple (the args for the call of call_py). This tuple contains one element, which is a dictionary and this dictionary itself
contains other dictionaries with dictionaries, lists, and numpy-arrays. Thus, if I call

  CALL args%destroy

does this recursively destroy all contained objects including the deallocation of the corresponding memory (in particular of the numpy arrays)? If so, OK, if not, how could I avoid memory leaks when looping over several calls of call_py. Where to free the memory?

@pjoeckel
Copy link
Author

pjoeckel commented Apr 30, 2021

Just to let you know: On a different HPC-Cluster also the complex application is working (except for the fact that the multiprocessing does not scale at all, but this is a different issue). So at the end, it seems that the problems are related to the python version or the version of the imported packages (or the used compiler, or ...).

@pjoeckel pjoeckel closed this as completed May 3, 2021
@ylikx
Copy link
Owner

ylikx commented May 5, 2021

Thanks for your observations. Great that you could make it work.

Regarding your question about object%destroy: destroy decreases the reference count of an object by 1. If the reference count drops to 0, then Python deallocates resources (memory etc.) that are occupied by the object. For lists etc. Python also decrements the reference count of each of its elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants