Is setting garbage=3 in Document.save necessary for Page.apply_redaction to fully remove information? #3836
-
Hi, I use this library to delete selected text from PDF documents. It's crucial that the removed text cannot be restored in any way. So far, everything has worked as expected. However, I recently read this topic, which suggests that setting Could you please confirm if I understood this correctly? I'm asking because setting |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Changing / Deleting objects in a PDF always and inevitably means disabling old objects. If you read the documentation for garbage collection, you will find that garbage collection 1 removes unused objects. "Unused" means that no reference to the object is found. "Removal" means that the object will physically no longer be present on the created output. The XREF table afterwards will contain holes (unused array items) which previously have pointed to no gone objects. Options 3 and 4 are meant to optimize the PDF in direction of size reduction. They both will not increase data protection level. 3 removes / consolidates object definition duplicates. |
Beta Was this translation helpful? Give feedback.
Changing / Deleting objects in a PDF always and inevitably means disabling old objects.
Physical removal of these zombie objects only ever happens with garbage collection.
This is a restriction or peculiarity of the PDF design - not anything specific to (Py-) MuPDF.
If you read the documentation for garbage collection, you will find that garbage collection 1 removes unused objects. "Unused" means that no reference to the object is found. "Removal" means that the object will physically no longer be present on the created output. The XREF table afterwards will contain holes (unused array items) which previously have pointed to no gone objects.
If you want to do the minimal required thing, t…