Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyPdf.utils.PdfReadError: multiple definitions in dictionary #13

Open
willfill opened this issue Jan 11, 2011 · 2 comments
Open

pyPdf.utils.PdfReadError: multiple definitions in dictionary #13

willfill opened this issue Jan 11, 2011 · 2 comments

Comments

@willfill
Copy link

i have some code :

import pyPdf

def getPDFContent():
content = ""
# Load PDF into pyPDF
pdf = pyPdf.PdfFileReader(file(pathToPdf, 'rb'))
# Iterate pages
print pdf.documentInfo
for i in range(0, pdf.getNumPages()):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + " \n"
# Collapse whitespace
content = u" ".join(content.replace(u"\xa0", u" ").strip().split())
return content
f = open(pathToTxt,'w+')
f.write(getPDFContent())
f.close()

where pathToPdf and pathToTxt it is absolute path to the files.
but i got error :
Traceback (most recent call last):
File "C:/Users/will/Desktop/coding/mytest.py", line 21, in
print pdf.getPage(14)
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 450, in getPage
self._flatten()
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 607, in _flatten
self._flatten(page.getObject(), inherit, **addt)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 165, in getObject
return self.pdf.getObject(self).getObject()
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 649, in getObject
retval = readObject(self.stream, self)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
value = readObject(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
value = readObject(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 534, in readFromStream
raise utils.PdfReadError, "multiple definitions in dictionary"
pyPdf.utils.PdfReadError: multiple definitions in dictionary

@sblzk
Copy link

sblzk commented Sep 29, 2011

@mlavin
Copy link

mlavin commented Jan 31, 2012

It isn't clear from the PDF spec whether duplicate keys should be allowed: http://pdf.editme.com/pdfua-docinfodictionary http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf (Section 10.2.1). The terminology (dictionary, key/value) seems to imply unique keys. It is clear that some programs are creating documents with duplicate keys making them unreadable by PyPDF due to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants