Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #8

Open
pwaller opened this issue Jul 25, 2014 · 11 comments
Open

Memory leak #8

pwaller opened this issue Jul 25, 2014 · 11 comments
Labels

Comments

@pwaller
Copy link
Collaborator

pwaller commented Jul 25, 2014

This demo has a memory usage which scales with N:

from pgi.repository import Poppler
doc = Poppler.Document.new_from_file(url, '')
N = 100000
for i in xrange(N):
    if i % 10000 == 0:
        print len(gc.get_objects())
    p = doc.get_page(0)

Calling g_free on p._obj causes a double free, so the problem is python-side. The number of alive objects grows by 4 per iteration.

@lazka
Copy link
Member

lazka commented Jul 25, 2014

Two problems:

  • GIBaseInfo instances leak (probably a cycle and they define del)
  • The gtype->python class lookup doesn't get cached, creating a GIBaseInfo each call

@lazka lazka added the bug label Jul 25, 2014
@pwaller
Copy link
Collaborator Author

pwaller commented Jul 25, 2014

Is this something you're attacking or shall I have a go?

@lazka
Copy link
Member

lazka commented Jul 25, 2014

Go ahead.

I'd guess using weakrefs instead of __del__ should fix it.

Like cffi's "gc(cdata, destructor)" https://bitbucket.org/cffi/cffi/src/af4e381b5e99c27c466377145a84eeece6e5c199/cffi/gc_weakref.py?at=default

I gave you commit rights btw, so you should be able to push directly.

pwaller added a commit that referenced this issue Jul 28, 2014
As per #8, there was a leak due to cycles and the definition of a
`__del__` destructor. This instead uses weak proxies to be notified of
deletion and invoke the correct destructor.

A new type, `_BaseFinalizer` is introduced, where one can override the
destruction behaviour by defining `destructor`.
@pwaller
Copy link
Collaborator Author

pwaller commented Jul 28, 2014

It also appears that .unref() isn't being called in addition. Not sure if this is a bug introduced with #10. Trying to understand why.

@pwaller
Copy link
Collaborator Author

pwaller commented Jul 28, 2014

I'm looking at the unpack_return code for Object.

It calls object.__new__(Poppler.Page) and sets its _ref, but I don't see any sign of garbage tracking.

Should we add an UnrefFinalizer.track() on the resulting object?

@pwaller
Copy link
Collaborator Author

pwaller commented Jul 28, 2014

@lazka -- this is the sort of thing I've done which seems to do the right thing.

However, I get the impression that you intended for this to already work so I don't know if my solution is in the spirit of your other code. I've not made a pull request for the linked commit yet, it perhaps belongs on top of #10 if you are happy with the approach.

@pwaller
Copy link
Collaborator Author

pwaller commented Jul 28, 2014

This is the code for track_and_unref which is called by the code in the above link.

@pwaller
Copy link
Collaborator Author

pwaller commented Mar 21, 2015

Tidying up my personal issues list, so closing this. Please create a new issue if you're still interested in tracking it.

@pwaller pwaller closed this as completed Mar 21, 2015
@pwaller pwaller reopened this May 26, 2015
@pwaller
Copy link
Collaborator Author

pwaller commented May 26, 2015

I'm still hitting this problem. Not sure what a clean solution is, advice welcomed!

This demonstrates the problem:

from pgi.repository import Poppler as poppler
doc = poppler.Document.new_from_file("file://test.pdf", "")
for i in range(doc.get_n_pages()):
    p = doc.get_page(i)
    # p.unref()
# doc.unref()

If I call .unref(), the problem goes away.

So I'd like to determine if the unref() can be automated, or if it supposed to be automatic why it currently isn't.

@pwaller
Copy link
Collaborator Author

pwaller commented May 30, 2016

Ping. I'd like to close this (preferably with a resolution), any advice?

@AmitANetskope
Copy link

Observing same issue while using Gsf, it is leading to file descriptor leak -

Python 3.8.7 (default, Dec 21 2020, 21:23:03)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgi
>>> pgi.install_as_gi()
>>> from gi.repository import Gsf
<stdin>:1: PyGIWarning: Gsf was imported without specifying a version first. Use gi.require_version('Gsf', '1') before import to ensure that the right version gets loaded.
>>> _gsf_inputstdio = Gsf.InputStdio.new("test_file")
>>> _gsf_infilemsole = Gsf.InfileMSOle.new(_gsf_inputstdio)
>>>
>>> _gsf_infilemsole.unref()
>>> _gsf_inputstdio.unref()

Calling unref releases the file descriptor. (Deleting the objects doesn't help)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants