Microsoft Reporting Service workaround #23

ghost · 2011-02-28T10:57:47Z

hey folks :)

on some files generated by Microsoft Reporting Service i get one of the following errors using this script:

from pyPdf import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input1 = PdfFileReader(file("infile.pdf", "rb"))

output.addPage(input1.getPage(0))

outputStream = file("outfile.pdf", "wb")

output.write(outputStream)

Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/Aitos_1/test.py", line 20, in
output.write(outputStream)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/generic.py", line 232, in readFromStream
return NumberObject(name)
ValueError: invalid literal for int() with base 10: ''

or using another approach (loading pages in array and then saving them):

Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/municipality stara zagora pdf combine 110228 start.py", line 60, in
outpdf.write(outfile)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/pdf.py", line 545, in getObject
self.stream.seek(start, 0)
ValueError: I/O operation on closed file

where the file is (of course) not closed

i workaround it resaving the file using pdftk like this:

from pyPdf import PdfFileWriter, PdfFileReader

import shlex, subprocess
pdftkcommand = 'pdftk infile.pdf cat output fixed_infile.pdf'
args = shlex.split(pdftkcommand)
subprocess.call(args)

output = PdfFileWriter()
input1 = PdfFileReader(file("fixed_infile.pdf", "rb"))

output.addPage(input1.getPage(0))

outputStream = file("outfile.pdf", "wb")

output.write(outputStream)

but only when using last pdftk version (1.44 - 1.41 produces blank pdf) - i guess this is what pdftk guys have fixed:
1.43 - September 30, 2010
Fixed a stream parsing bug that was causing page content to disappear after merge of PDFs generated by Microsoft Reporting Services PDF Rendering Extension 10.0.0.0.

unfortunately i can't provide the broken file as contents are confidential

hope this helps :)

georgi

ghost · 2011-02-28T10:59:33Z

i don't know why the formatting broke - i copy-pasted pure text :( also i can provide the full traceback if needed

johnwhitington · 2014-03-25T19:54:18Z

I just put a workaround into CamlPDF to fix the same problem.

The malformity is that the streams in files produced by Microsoft Reporting Services put a space character immediately after the 'stream' keyword (before the CR / LF).

The solution is, after reading the stream keyword, to consume all whitespace-characters-other-than-cr-or-lf before looking for the newline as normal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft Reporting Service workaround #23

Microsoft Reporting Service workaround #23

ghost commented Feb 28, 2011

ghost commented Feb 28, 2011

johnwhitington commented Mar 25, 2014

Microsoft Reporting Service workaround #23

Microsoft Reporting Service workaround #23

Comments

ghost commented Feb 28, 2011

output.write(outputStream)

output.write(outputStream)

ghost commented Feb 28, 2011

johnwhitington commented Mar 25, 2014