Description
Please provide all mandatory information!
Describe the bug (mandatory)
I think there's a memory leak in the save()
method in fitz.
To Reproduce (mandatory)
My code adds a PDF cover page to an existing PDF. It merges the two PDFs in memory using fitz's insert_pdf
method. The merged PDF is converted to bytes
using the write
method.
The method that reproduces the memory leak is as follows:
def merge_pdf(content: bytes, coverpage: bytes):
try:
with fitz.Document(stream=coverpage, filetype="pdf") as coverpage_pdf, fitz.Document(
stream=content, filetype="pdf"
) as content_pdf:
coverpage_pdf.insert_pdf(content_pdf)
doc = coverpage_pdf.write()
return doc
except:
pass
This method takes a snapshot of memory and displays fitz-related statistics by file name and line number.
def take_snapshot():
snapshot = tracemalloc.take_snapshot()
for i, filename_stat in enumerate(snapshot.statistics("filename")[:20], 1):
if "fitz" in str(filename_stat):
for line_stat in snapshot.statistics("lineno")[:20]:
if "fitz" in str(line_stat):
log.info("line_stat", stat=str(line_stat))
log.info("filename_stat", i=i, stat=str(filename_stat))
print("=" * 50)
Finally, I iterate over a set of ~6000 PDFs. For each PDF, I call the merge_pdf method and then the take_snapshot method. During the initial iterations, fitz doesn't appear in the list of statistics by file name or line number. After a few iterations, fitz does appear, and I can see that both the memory it uses and the number of objects referred to it are increasing by 1 to 3 KiB per iteration.
line_stat iteration=51 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=163 KiB, count=5964, average=28 B
filename_stat i=14 iteration=51 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=177 KiB, count=6018, average=30 B
==================================================
line_stat iteration=52 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=166 KiB, count=6052, average=28 B
filename_stat i=14 iteration=52 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=178 KiB, count=6097, average=30 B
==================================================
line_stat iteration=53 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=168 KiB, count=6137, average=28 B
filename_stat i=14 iteration=53 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=181 KiB, count=6192, average=30 B
==================================================
line_stat iteration=54 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:4720: size=170 KiB, count=6228, average=28 B
filename_stat i=14 iteration=54 stat=/home/dcormier/.virtualenvs/eruditorg/lib64/python3.8/site-packages/fitz/fitz.py:0: size=183 KiB, count=6273, average=30 B
Unfortunately, my diagnostic skills are limited at this stage, as I have no experience of SWIG. What could be the next steps to diagnose the memory leak more precisely and eventually solve it? Does the code in Document_save
in fitz.py
match the contents of lines 1999 to 2081 in fitz.i
? Am I right in thinking that the cause of the memory leak lies within this code in fitz.i
? What should I look for to solve the problem?
Your configuration (mandatory)
- Fedora Linux 38 64 bit
- '\nPyMuPDF 1.23.6: Python bindings for the MuPDF 1.23.5 library.\nVersion date: 2023-11-06 00:00:01.\nBuilt for Python 3.8 on linux (64-bit).\n'