Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDF is not properly cleaned up on pdftk errors #49

Open
suspectpart opened this issue Jun 10, 2021 · 0 comments
Open

FDF is not properly cleaned up on pdftk errors #49

suspectpart opened this issue Jun 10, 2021 · 0 comments

Comments

@suspectpart
Copy link

Working with your library extensively today (it's great!), I noticed that whenever the pdftk subprocess crashes, some artifacts are not being cleaned up from /tmp.

Here is a proof that asserts that in case of a pdftk crash, the generated FDF remains in /tmp:

import os
import re
import subprocess

from pypdftk import fill_form

datas = {
    # this is gonna crash pdftk, because the angle bracket is not escaped
    'Given Name Text Box': '1 < 2'
}

try:
    fill_form('./OoPdfFormExample.pdf', datas=datas, out_file='out.pdf')
except subprocess.CalledProcessError as e:
    fdf_file = re.match(r'^.+(/tmp/.+?)\s', str(e)).group(1)

    assert os.path.exists(fdf_file)

(The PDF sample is taken from this source)

An opening angle bracket makes pdftk crash, because it's rendered into the XML as-is and not escaped (that's an issue of it's own - gonna file that too).

The fill_form function looks like this:

def fill_form(pdf_path, datas={}, out_file=None, flatten=True):
    cleanOnFail = False
    tmp_fdf = gen_xfdf(datas)
    handle = None
    if not out_file:
        cleanOnFail = True
        handle, out_file = tempfile.mkstemp()
    ...
    try:
        run_command(cmd, True)
    except:
        if cleanOnFail:
            os.remove(tmp_fdf)
        raise
    finally:
        if handle:
            os.close(handle)
    os.remove(tmp_fdf)
    return out_file

If run_command(cmd, True) raises, first the except, then the finally block are executed. Because the except block re-raises, the last two lines (after the finally) block are never executed. So in case cleanOnFail is False (when outfile is being passed as an argument), the tmp_fdf is never removed.

There is another bug hiding in this: If outfile is not being passed, a temporary file is created and a handle is opened:

 if not out_file:
        cleanOnFail = True
        handle, out_file = tempfile.mkstemp()

In the finally block, the handle is closed, but the out_file is not being removed (although the failing pdftk already created the file, even though it's 0 bytes. It almost looks like a copy-paste problem, that if cleanOnFail is True, actually os.remove(out_file) should have been called?

I can fix both problems and create a PR, if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant