Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function to generate pdf chunks #35

Open
spfeifer222 opened this issue Nov 26, 2019 · 1 comment
Open

function to generate pdf chunks #35

spfeifer222 opened this issue Nov 26, 2019 · 1 comment

Comments

@spfeifer222
Copy link

I tried to write a function to create pdf_chunk files via pdftk syntax:

pdftk in.pdf cat 1-end output out.pdf

using:

def gen_pdfchunks(pdf_path, chunksize=5, first=1, last='end'):
    '''
    Generate chunks of large pdf files.
    '''
    cleanOnFail = True
    out_dir = tempfile.mkdtemp()
    chunk_file = '%s/chunk.pdf' % out_dir

    page = first
    if last == 'end':
        last_page = pypdftk.get_num_pages(pdf_path)
    else:
        last_page = int(last)

    # pdftk: pdftk in.pdf cat 1-5 output out1.pdf
    while page <= last_page:
        args = [PDFTK_PATH, pdf_path, 'cat']
        if page + chunksize - 1 > last_page:
            args += str(page)+'-'+str(last_page)
        else:
            args += str(page)+'-'+str(chunksize-1)
        args += ['output', chunk_file]
        try:
            run_command(args)
        except:
            if cleanOnFail:
                shutil.rmtree(out_dir)
            raise
        page += chunksize
        yield chunk_file

Calling

chunk_gen = gen_pdfchunks(file)
next(chunk_gen)

with a valid pdf file in varibale file the folling output is produced:

Error: Unexpected range end; expected a page
   number or legal keyword, here: 
   Exiting.
Errors encountered.  No output created.
Done.  Input errors, so no output created.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-71-8f19f4570272> in <module>
----> 1 next(chunk_gen)

<ipython-input-69-9da5f665d933> in gen_pdfchunks(pdf_path, chunksize, first, last)
     22         args += ['output', chunk_file]
     23         try:
---> 24             run_command(args)
     25         except:
     26             if cleanOnFail:

<ipython-input-38-64212d8798aa> in run_command(command, shell)
     14 def run_command(command, shell=False):
     15     ''' run a system command and yield output '''
---> 16     p = check_output(command, shell=shell)
     17     return p.split(b'\n')
     18 

<ipython-input-38-64212d8798aa> in check_output(*popenargs, **kwargs)
      9         if cmd is None:
     10             cmd = popenargs[0]
---> 11         raise subprocess.CalledProcessError(retcode, cmd, output=output)
     12     return output
     13 

CalledProcessError: Command '['/usr/bin/pdftk', '/home/pfeifer/Multimedia/Bücher@LP/Akupunktur/Praxis Akupressur - Christina Mildt.pdf', 'cat', '1', '-', '4', 'output', '/tmp/tmp32l4j3vt/chunk.pdf']' returned non-zero exit status 1.

The problem is, that the given page range for the chunks does not keep one string, e.g. with chunksize=5 this should be 1-4 instead of 1 - 4 which is what the command in the CalledProcessError line is doing.

I just started to write in Python, but like to contribute to the project. I am not 100% confident that the code would do what I want, if the error not have been raised. In order to improve the project and my skills I like to ask for suggestions to clean/improve/correct my given code and of course help in developing the desired function.

Best regards,
Sebastian

@revolunet
Copy link
Owner

Hi Sebastian, thanks for reporting

The thing is this line args+=str(page)+'-'+str(chunksize-1) doesnt append one item to args but three items separately

you could use args.append(str(page)+'-'+str(chunksize-1)) instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants