Skip to content

Experiencing small memory leak in save() #2791

Closed
@cormier

Description

@cormier

Please provide all mandatory information!

Describe the bug (mandatory)

I think there's a memory leak in the save() method in fitz.

To Reproduce (mandatory)

My code adds a PDF cover page to an existing PDF. It merges the two PDFs in memory using fitz's insert_pdf method. The merged PDF is converted to bytes using the write method.

The method that reproduces the memory leak is as follows:

def merge_pdf(content: bytes, coverpage: bytes):
    try:
        with fitz.Document(stream=coverpage, filetype="pdf") as coverpage_pdf, fitz.Document(
            stream=content, filetype="pdf"
        ) as content_pdf:
            coverpage_pdf.insert_pdf(content_pdf)
            doc = coverpage_pdf.write()
            return doc
    except:
        pass

This method takes a snapshot of memory and displays fitz-related statistics by file name and line number.

def take_snapshot():

    snapshot = tracemalloc.take_snapshot()
    for i, filename_stat in enumerate(snapshot.statistics("filename")[:20], 1):
        if "fitz" in str(filename_stat):
            for line_stat in snapshot.statistics("lineno")[:20]:
                if "fitz" in str(line_stat):
                    log.info("line_stat", stat=str(line_stat))
            log.info("filename_stat", i=i, stat=str(filename_stat))

    print("=" * 50)

Finally, I iterate over a set of ~6000 PDFs. For each PDF, I call the merge_pdf method and then the take_snapshot method. During the initial iterations, fitz doesn't appear in the list of statistics by file name or line number. After a few iterations, fitz does appear, and I can see that both the memory it uses and the number of objects referred to it are increasing by 1 to 3 KiB per iteration.

line_stat                      iteration=51 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=163 KiB, count=5964, average=28 B
filename_stat                  i=14 iteration=51 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=177 KiB, count=6018, average=30 B
==================================================
line_stat                      iteration=52 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=166 KiB, count=6052, average=28 B
filename_stat                  i=14 iteration=52 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=178 KiB, count=6097, average=30 B
==================================================
line_stat                      iteration=53 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=168 KiB, count=6137, average=28 B
filename_stat                  i=14 iteration=53 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=181 KiB, count=6192, average=30 B
==================================================
line_stat                      iteration=54 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=170 KiB, count=6228, average=28 B
filename_stat                  i=14 iteration=54 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=183 KiB, count=6273, average=30 B

Unfortunately, my diagnostic skills are limited at this stage, as I have no experience of SWIG. What could be the next steps to diagnose the memory leak more precisely and eventually solve it? Does the code in Document_save in fitz.py match the contents of lines 1999 to 2081 in fitz.i? Am I right in thinking that the cause of the memory leak lies within this code in fitz.i ? What should I look for to solve the problem?

Your configuration (mandatory)

  • Fedora Linux 38 64 bit
  • '\nPyMuPDF 1.23.6: Python bindings for the MuPDF 1.23.5 library.\nVersion date: 2023-11-06 00:00:01.\nBuilt for Python 3.8 on linux (64-bit).\n'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions