We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An error occurred when I was trying to optimize a PDF with OCRmyPDF.
1. Run ocrmypdf -v1 --output-type pdf --tesseract-timeout 0 --optimize 2 --skip-text input.pdf output.pdf 2. See error.
I will share the file if you need it but I think that the log output should be sufficient in this case.
PyPI (pip, poetry, pipx, etc.)
16.5.0
ocrmypdf 16.5.0 __main__.py:59 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version'] __init__.py:133 Found tesseract 5.3.4.20240503 __init__.py:343 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version'] __init__.py:133 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version'] __init__.py:133 Running: ['C:\\pngquant\\pngquant.EXE', '--version'] __init__.py:133 Found pngquant 2.17.0 __init__.py:343 Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version'] __init__.py:133 Found jbig2 0.29 __init__.py:343 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version'] __init__.py:133 Found gs 10.3.1 __init__.py:343 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version'] __init__.py:133 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs'] __init__.py:133 stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2): __init__.py:73 eng osd No language specified; assuming --language eng _validation.py:54 pikepdf mmap enabled helpers.py:328 Gathering info with 1 thread workers info.py:804 pikepdf mmap enabled helpers.py:328 Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00 Using Tesseract OpenMP thread limit 2 tesseract_ocr.py:199 Start processing 2 pages concurrently ocr.py:96 pikepdf mmap enabled helpers.py:328 pikepdf mmap enabled helpers.py:328 1 skipping all processing on this page _pipeline.py:330 2 skipping all processing on this page _pipeline.py:330 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 _graft.py:140 1 Page rotation: (content, auto) -> page = (0, 0) -> 0 _graft.py:165 2 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0 _graft.py:140 2 Page rotation: (content, auto) -> page = (0, 0) -> 0 _graft.py:165 Image processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00 Postprocessing... ocr.py:144 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version'] __init__.py:133 Linearizing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00 xref 152: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R42 in page 0 optimize.py:265 xref 158: treating as an optimization candidate optimize.py:282 xref 154: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R28 in page 0 optimize.py:265 xref 150: treating as an optimization candidate optimize.py:282 xref 150: skipping image because it is an SMask optimize.py:280 xref 151: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R33 in page 0 optimize.py:265 xref 156: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R27 in page 0 optimize.py:265 xref 146: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R44 in page 0 optimize.py:265 Recursing into Form XObject /R37 in page 0 optimize.py:265 Recursing into Form XObject /R48 in page 0 optimize.py:265 xref 160: treating as an optimization candidate optimize.py:282 xref 148: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R16 in page 0 optimize.py:265 xref 146: skipping image because it is an SMask optimize.py:280 xref 147: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R23 in page 0 optimize.py:265 Recursing into Form XObject /R32 in page 0 optimize.py:265 Recursing into Form XObject /R38 in page 0 optimize.py:265 Recursing into Form XObject /R20 in page 0 optimize.py:265 XrefExt(xref=160, ext='.jpg') optimize.py:347 XrefExt(xref=147, ext='.png') optimize.py:347 XrefExt(xref=148, ext='.jpg') optimize.py:347 XrefExt(xref=151, ext='.png') optimize.py:347 XrefExt(xref=152, ext='.jpg') optimize.py:347 XrefExt(xref=154, ext='.jpg') optimize.py:347 XrefExt(xref=156, ext='.jpg') optimize.py:347 XrefExt(xref=158, ext='.jpg') optimize.py:347 Optimizable images: JPEGs: 6 PNGs: 2 optimize.py:352 Recompressing JPEGs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00 xref 152: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R42 in page 0 optimize.py:265 xref 158: treating as an optimization candidate optimize.py:282 xref 154: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R28 in page 0 optimize.py:265 xref 150: treating as an optimization candidate optimize.py:282 xref 150: skipping image because it is an SMask optimize.py:280 xref 151: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R33 in page 0 optimize.py:265 xref 156: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R27 in page 0 optimize.py:265 xref 146: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R44 in page 0 optimize.py:265 Recursing into Form XObject /R37 in page 0 optimize.py:265 Recursing into Form XObject /R48 in page 0 optimize.py:265 xref 160: treating as an optimization candidate optimize.py:282 xref 148: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R16 in page 0 optimize.py:265 xref 146: skipping image because it is an SMask optimize.py:280 xref 147: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R23 in page 0 optimize.py:265 Recursing into Form XObject /R32 in page 0 optimize.py:265 Recursing into Form XObject /R38 in page 0 optimize.py:265 Recursing into Form XObject /R20 in page 0 optimize.py:265 xref 160: marking this JPEG as deflatable optimize.py:547 xref 148: marking this JPEG as deflatable optimize.py:547 xref 152: marking this JPEG as deflatable optimize.py:547 xref 154: marking this JPEG as deflatable optimize.py:547 xref 156: marking this JPEG as deflatable optimize.py:547 xref 158: marking this JPEG as deflatable optimize.py:547 Deflating JPEGs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00 C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\images\00000147.png optimize.py:641 C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\images\00000151.png optimize.py:641 Running: ['C:\\pngquant\\pngquant.EXE', '--force', '--skip-if-larger', '--quality', '60-80', '--', '-'] __init__.py:133 Running: ['C:\\pngquant\\pngquant.EXE', '--force', '--skip-if-larger', '--quality', '60-80', '--', '-'] __init__.py:133 PNGs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00 PIL format = PNG img2pdf.py:1834 imgformat = PNG img2pdf.py:1852 input dpi = 96 x 96 img2pdf.py:1371 rotation = 0° img2pdf.py:1421 input colorspace = P img2pdf.py:1455 width x height = 1125px x 1055px img2pdf.py:1508 read_images() embeds a PNG img2pdf.py:2050 PIL format = PNG img2pdf.py:1834 imgformat = PNG img2pdf.py:1852 input dpi = 96 x 96 img2pdf.py:1371 rotation = 0° img2pdf.py:1421 input colorspace = P img2pdf.py:1455 width x height = 2048px x 1536px img2pdf.py:1508 read_images() embeds a PNG img2pdf.py:2050 xref 152: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R42 in page 0 optimize.py:265 xref 158: treating as an optimization candidate optimize.py:282 xref 154: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R28 in page 0 optimize.py:265 xref 150: treating as an optimization candidate optimize.py:282 xref 150: skipping image because it is an SMask optimize.py:280 xref 151: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R33 in page 0 optimize.py:265 xref 156: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R27 in page 0 optimize.py:265 xref 146: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R44 in page 0 optimize.py:265 Recursing into Form XObject /R37 in page 0 optimize.py:265 Recursing into Form XObject /R48 in page 0 optimize.py:265 xref 160: treating as an optimization candidate optimize.py:282 xref 148: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R16 in page 0 optimize.py:265 xref 146: skipping image because it is an SMask optimize.py:280 xref 147: treating as an optimization candidate optimize.py:282 Recursing into Form XObject /R23 in page 0 optimize.py:265 Recursing into Form XObject /R32 in page 0 optimize.py:265 Recursing into Form XObject /R38 in page 0 optimize.py:265 Recursing into Form XObject /R20 in page 0 optimize.py:265 xref 160: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version'] __init__.py:133 xref 147: While extracting this image, an error occurred optimize.py:330 Traceback (most recent call last): File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 326, in extract_images result = extract_fn( ^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 157, in extract_image_jbig2 imgname.rename(imgname.with_suffix(ext)) File "C:\Program Files\Python312\Lib\pathlib.py", line 1363, in rename os.rename(self, target) FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000147' -> 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000147.png' xref 148: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version'] __init__.py:133 xref 151: While extracting this image, an error occurred optimize.py:330 Traceback (most recent call last): File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 326, in extract_images result = extract_fn( ^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 157, in extract_image_jbig2 imgname.rename(imgname.with_suffix(ext)) File "C:\Program Files\Python312\Lib\pathlib.py", line 1363, in rename os.rename(self, target) FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000151' -> 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000151.png' xref 152: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 xref 154: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 xref 156: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 xref 158: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization optimize.py:98 Optimizable images: JBIG2 groups: 0 optimize.py:363 JBIG2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/0 -:--:-- Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version'] __init__.py:133 Running: ['C:\\pngquant\\pngquant.EXE', '--version'] __init__.py:133 Image optimization ratio: 2.05 savings: 51.1% _pipeline.py:989 Total file size ratio: 2.05 savings: 51.1% _pipeline.py:992 C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\optimize.pdf -> output.pdf _pipeline.py:1064
The text was updated successfully, but these errors were encountered:
jbarlow83
No branches or pull requests
Describe the bug
An error occurred when I was trying to optimize a PDF with OCRmyPDF.
Steps to reproduce
Files
I will share the file if you need it but I think that the log output should be sufficient in this case.
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
16.5.0
Relevant log output
The text was updated successfully, but these errors were encountered: