[Bug]: Cannot create a file when that file already exists #1396

user1823 · 2024-09-18T17:33:31Z

Describe the bug

An error occurred when I was trying to optimize a PDF with OCRmyPDF.

Steps to reproduce

1. Run ocrmypdf -v1 --output-type pdf --tesseract-timeout 0 --optimize 2 --skip-text input.pdf output.pdf
2. See error.

Files

I will share the file if you need it but I think that the log output should be sufficient in this case.

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

16.5.0

Relevant log output

ocrmypdf 16.5.0                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.4.20240503                                                                           __init__.py:343
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Found pngquant 2.17.0                                                                                    __init__.py:343
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Found jbig2 0.29                                                                                         __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.1                                                                                          __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:54
pikepdf mmap enabled                                                                                      helpers.py:328
Gathering info with 1 thread workers                                                                         info.py:804
pikepdf mmap enabled                                                                                      helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00
Using Tesseract OpenMP thread limit 2                                                               tesseract_ocr.py:199
Start processing 2 pages concurrently                                                                          ocr.py:96
pikepdf mmap enabled                                                                                      helpers.py:328
pikepdf mmap enabled                                                                                      helpers.py:328
    1 skipping all processing on this page                                                              _pipeline.py:330
    2 skipping all processing on this page                                                              _pipeline.py:330
    1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
    2 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    2 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
Image processing      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00
Postprocessing...                                                                                             ocr.py:144
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
xref 152: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R42 in page 0                                                               optimize.py:265
xref 158: treating as an optimization candidate                                                          optimize.py:282
xref 154: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R28 in page 0                                                               optimize.py:265
xref 150: treating as an optimization candidate                                                          optimize.py:282
xref 150: skipping image because it is an SMask                                                          optimize.py:280
xref 151: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R33 in page 0                                                               optimize.py:265
xref 156: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R27 in page 0                                                               optimize.py:265
xref 146: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R44 in page 0                                                               optimize.py:265
Recursing into Form XObject /R37 in page 0                                                               optimize.py:265
Recursing into Form XObject /R48 in page 0                                                               optimize.py:265
xref 160: treating as an optimization candidate                                                          optimize.py:282
xref 148: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R16 in page 0                                                               optimize.py:265
xref 146: skipping image because it is an SMask                                                          optimize.py:280
xref 147: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R23 in page 0                                                               optimize.py:265
Recursing into Form XObject /R32 in page 0                                                               optimize.py:265
Recursing into Form XObject /R38 in page 0                                                               optimize.py:265
Recursing into Form XObject /R20 in page 0                                                               optimize.py:265
XrefExt(xref=160, ext='.jpg')                                                                            optimize.py:347
XrefExt(xref=147, ext='.png')                                                                            optimize.py:347
XrefExt(xref=148, ext='.jpg')                                                                            optimize.py:347
XrefExt(xref=151, ext='.png')                                                                            optimize.py:347
XrefExt(xref=152, ext='.jpg')                                                                            optimize.py:347
XrefExt(xref=154, ext='.jpg')                                                                            optimize.py:347
XrefExt(xref=156, ext='.jpg')                                                                            optimize.py:347
XrefExt(xref=158, ext='.jpg')                                                                            optimize.py:347
Optimizable images: JPEGs: 6 PNGs: 2                                                                     optimize.py:352
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
xref 152: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R42 in page 0                                                               optimize.py:265
xref 158: treating as an optimization candidate                                                          optimize.py:282
xref 154: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R28 in page 0                                                               optimize.py:265
xref 150: treating as an optimization candidate                                                          optimize.py:282
xref 150: skipping image because it is an SMask                                                          optimize.py:280
xref 151: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R33 in page 0                                                               optimize.py:265
xref 156: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R27 in page 0                                                               optimize.py:265
xref 146: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R44 in page 0                                                               optimize.py:265
Recursing into Form XObject /R37 in page 0                                                               optimize.py:265
Recursing into Form XObject /R48 in page 0                                                               optimize.py:265
xref 160: treating as an optimization candidate                                                          optimize.py:282
xref 148: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R16 in page 0                                                               optimize.py:265
xref 146: skipping image because it is an SMask                                                          optimize.py:280
xref 147: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R23 in page 0                                                               optimize.py:265
Recursing into Form XObject /R32 in page 0                                                               optimize.py:265
Recursing into Form XObject /R38 in page 0                                                               optimize.py:265
Recursing into Form XObject /R20 in page 0                                                               optimize.py:265
xref 160: marking this JPEG as deflatable                                                                optimize.py:547
xref 148: marking this JPEG as deflatable                                                                optimize.py:547
xref 152: marking this JPEG as deflatable                                                                optimize.py:547
xref 154: marking this JPEG as deflatable                                                                optimize.py:547
xref 156: marking this JPEG as deflatable                                                                optimize.py:547
xref 158: marking this JPEG as deflatable                                                                optimize.py:547
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:00:00
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\images\00000147.png                               optimize.py:641
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\images\00000151.png                               optimize.py:641
Running: ['C:\\pngquant\\pngquant.EXE', '--force', '--skip-if-larger', '--quality', '60-80', '--', '-']  __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--force', '--skip-if-larger', '--quality', '60-80', '--', '-']  __init__.py:133
PNGs                  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2/2 0:00:00
PIL format = PNG                                                                                         img2pdf.py:1834
imgformat = PNG                                                                                          img2pdf.py:1852
input dpi = 96 x 96                                                                                      img2pdf.py:1371
rotation = 0°                                                                                            img2pdf.py:1421
input colorspace = P                                                                                     img2pdf.py:1455
width x height = 1125px x 1055px                                                                         img2pdf.py:1508
read_images() embeds a PNG                                                                               img2pdf.py:2050
PIL format = PNG                                                                                         img2pdf.py:1834
imgformat = PNG                                                                                          img2pdf.py:1852
input dpi = 96 x 96                                                                                      img2pdf.py:1371
rotation = 0°                                                                                            img2pdf.py:1421
input colorspace = P                                                                                     img2pdf.py:1455
width x height = 2048px x 1536px                                                                         img2pdf.py:1508
read_images() embeds a PNG                                                                               img2pdf.py:2050
xref 152: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R42 in page 0                                                               optimize.py:265
xref 158: treating as an optimization candidate                                                          optimize.py:282
xref 154: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R28 in page 0                                                               optimize.py:265
xref 150: treating as an optimization candidate                                                          optimize.py:282
xref 150: skipping image because it is an SMask                                                          optimize.py:280
xref 151: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R33 in page 0                                                               optimize.py:265
xref 156: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R27 in page 0                                                               optimize.py:265
xref 146: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R44 in page 0                                                               optimize.py:265
Recursing into Form XObject /R37 in page 0                                                               optimize.py:265
Recursing into Form XObject /R48 in page 0                                                               optimize.py:265
xref 160: treating as an optimization candidate                                                          optimize.py:282
xref 148: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R16 in page 0                                                               optimize.py:265
xref 146: skipping image because it is an SMask                                                          optimize.py:280
xref 147: treating as an optimization candidate                                                          optimize.py:282
Recursing into Form XObject /R23 in page 0                                                               optimize.py:265
Recursing into Form XObject /R32 in page 0                                                               optimize.py:265
Recursing into Form XObject /R38 in page 0                                                               optimize.py:265
Recursing into Form XObject /R20 in page 0                                                               optimize.py:265
xref 160: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
xref 147: While extracting this image, an error occurred                                                 optimize.py:330
Traceback (most recent call last):
  File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 326, in extract_images
    result = extract_fn(
             ^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 157, in
extract_image_jbig2
    imgname.rename(imgname.with_suffix(ext))
  File "C:\Program Files\Python312\Lib\pathlib.py", line 1363, in rename
    os.rename(self, target)
FileExistsError: [WinError 183] Cannot create a file when that file already exists:
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000147' ->
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000147.png'
xref 148: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
xref 151: While extracting this image, an error occurred                                                 optimize.py:330
Traceback (most recent call last):
  File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 326, in extract_images
    result = extract_fn(
             ^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\site-packages\ocrmypdf\optimize.py", line 157, in
extract_image_jbig2
    imgname.rename(imgname.with_suffix(ext))
  File "C:\Program Files\Python312\Lib\pathlib.py", line 1363, in rename
    os.rename(self, target)
FileExistsError: [WinError 183] Cannot create a file when that file already exists:
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000151' ->
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.si0svy5x\\images\\00000151.png'
xref 152: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
xref 154: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
xref 156: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
xref 158: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                 optimize.py:98
Optimizable images: JBIG2 groups: 0                                                                      optimize.py:363
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Image optimization ratio: 2.05 savings: 51.1%                                                           _pipeline.py:989
Total file size ratio: 2.05 savings: 51.1%                                                              _pipeline.py:992
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.si0svy5x\optimize.pdf -> output.pdf                      _pipeline.py:1064

The text was updated successfully, but these errors were encountered:

user1823 added the triage Issue needs triage label Sep 18, 2024

user1823 assigned jbarlow83 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cannot create a file when that file already exists #1396

[Bug]: Cannot create a file when that file already exists #1396

user1823 commented Sep 18, 2024 •

edited

Loading

[Bug]: Cannot create a file when that file already exists #1396

[Bug]: Cannot create a file when that file already exists #1396

Comments

user1823 commented Sep 18, 2024 • edited Loading

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

user1823 commented Sep 18, 2024 •

edited

Loading