Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Output PDF is too large #1366

Open
user1823 opened this issue Aug 2, 2024 · 0 comments
Open

[Bug]: Output PDF is too large #1366

user1823 opened this issue Aug 2, 2024 · 0 comments
Assignees
Labels
triage Issue needs triage

Comments

@user1823
Copy link

user1823 commented Aug 2, 2024

Describe the bug

On performing OCR (with --force-ocr), the output file size is 5.44× larger than the input file.

When the input file is re-written with GS and OCR is performed on the output, the OCRed file is only slightly larger than the original input file.

Input file size = 686 KB
OCRed = 3732 KB

Re-written with GS = 699 KB
OCRed = 804 KB

Steps to reproduce

1. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr in.pdf ocr.pdf
2. See that the output file is 5.44 times larger than the input file.
3. Run gswin64.exe -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=gs.pdf in.pdf
4. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr gs.pdf gs_ocr.pdf
5. See that the OCRed file is now only slightly larger.

Files

in.pdf (Same file as that in #1361)

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

16.4.3

Relevant log output

When run on original file:

ocrmypdf 16.4.3                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.4.20240503                                                                           __init__.py:343
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.1                                                                                          __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:54
pikepdf mmap enabled                                                                                      helpers.py:328
Gathering info with 1 thread workers                                                                         info.py:800
pikepdf mmap enabled                                                                                      helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                                                               tesseract_ocr.py:199
pikepdf mmap enabled                                                                                      helpers.py:328
    1 page already has text! - rasterizing text and running OCR anyway                                  _pipeline.py:318
    1 Rasterize with png16m, rotation 0                                                                 _pipeline.py:539
    1 Weighted average image DPI is 175.4, max DPI is 600.0. The discrepancy may indicate a high detail _pipeline.py:477
region on this page, but could also indicate a problem with the input PDF file. Page image will be
rendered at 400.0 DPI.
    1 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH',  __init__.py:133
'-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r400.000000x400.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None',
'-f', 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\origin.pdf']
    1 Rotating output by 0                                                                            ghostscript.py:149
    1 resolution (399.9992, 399.9992)                                                                   _pipeline.py:618
    1 Resizing image to fit image dimensions limit                                                        imageops.py:56
    1 Rescaled image to (2479, 3508) pixels and (300, 300) dpi                                           imageops.py:151
    1 convert                                                                                           _pipeline.py:735
    1 PIL format = PNG                                                                                   img2pdf.py:1834
    1 imgformat = PNG                                                                                    img2pdf.py:1852
    1 input dpi = 400 x 400                                                                              img2pdf.py:1371
    1 rotation = 0°                                                                                      img2pdf.py:1421
    1 input colorspace = RGB                                                                             img2pdf.py:1455
    1 width x height = 3307px x 4678px                                                                   img2pdf.py:1508
    1 read_images() embeds a PNG                                                                         img2pdf.py:2050
    1 convert done                                                                                      _pipeline.py:745
    1 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '-l', 'eng',                          __init__.py:133
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\000001_ocr.png',
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\000001_ocr_hocr', 'hocr', 'txt']
    1 pikepdf.Matrix(0.18, 0, 0, -0.18, 0, 631.44)                                                          _hocr.py:203
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 824, 179)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 373)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1954, 386)                                                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 42, 530)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 656)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 41, 734)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, -0.00699983, 0.00699983, 0.999976, 40, 940)                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 39, 1019)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 34, 1099)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1155)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, -0.0109993, 0.0109993, 0.99994, 33, 1342)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1420)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1500)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1298, 1613)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1695)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1905)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1985)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999982, 0.00599989, -0.00599989, 0.999982, 35, 2065)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2160)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2242)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 2471)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 37, 2553)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, 0.00699983, -0.00699983, 0.999976, 37, 2640)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, 0.0109993, -0.0109993, 0.99994, 34, 2722)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999916, 0.0129989, -0.0129989, 0.999916, 40, 3033)                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999988, 0.00499994, -0.00499994, 0.999988, 42, 3113)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3191)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3225)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 58, 3256)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 60, 3288)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 49, 3317)                                                                  _hocr.py:323
    1 Emplacement update                                                                                   _graft.py:123
    1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    1 Grafting                                                                                             _graft.py:251
    1 Grafting with ctm pikepdf.Matrix(1.33414, 0, 0, 1.33352, 0, -5.68434e-14)                            _graft.py:294
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Postprocessing...                                                                                             ocr.py:144
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
XrefExt(xref=11, ext='.png')                                                                             optimize.py:347
Optimizable images: JPEGs: 0 PNGs: 1                                                                     optimize.py:352
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
Optimizable images: JBIG2 groups: 0                                                                      optimize.py:363
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Image optimization did not improve the file - optimizations will not be used                             optimize.py:720
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Image optimization ratio: 1.00 savings: -0.0%                                                           _pipeline.py:989
Total file size ratio: 0.18 savings: -444.1%                                                            _pipeline.py:992
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.2083b0_b\optimize.pdf -> ocr.pdf                         _pipeline.py:1064
The output file size is 5.44× larger than the input file.                                             _validation.py:364
Possible reasons for this include:
--force-ocr was issued, causing transcoding.

When run on file re-written using GS:

ocrmypdf 16.4.3                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.4.20240503                                                                           __init__.py:343
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.1                                                                                          __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:54
pikepdf mmap enabled                                                                                      helpers.py:328
Gathering info with 1 thread workers                                                                         info.py:800
pikepdf mmap enabled                                                                                      helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                                                               tesseract_ocr.py:199
pikepdf mmap enabled                                                                                      helpers.py:328
    1 page already has text! - rasterizing text and running OCR anyway                                  _pipeline.py:318
    1 Rasterize with png16m, rotation 0                                                                 _pipeline.py:539
    1 Weighted average image DPI is 175.4, max DPI is 600.0. The discrepancy may indicate a high detail _pipeline.py:477
region on this page, but could also indicate a problem with the input PDF file. Page image will be
rendered at 400.0 DPI.
    1 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH',  __init__.py:133
'-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r400.000000x400.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None',
'-f', 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\origin.pdf']
    1 Rotating output by 0                                                                            ghostscript.py:149
    1 resolution (399.9992, 399.9992)                                                                   _pipeline.py:618
    1 Resizing image to fit image dimensions limit                                                        imageops.py:56
    1 Rescaled image to (2479, 3508) pixels and (300, 300) dpi                                           imageops.py:151
    1 convert                                                                                           _pipeline.py:735
    1 PIL format = JPEG                                                                                  img2pdf.py:1834
    1 imgformat = JPEG                                                                                   img2pdf.py:1852
    1 input dpi = 400 x 400                                                                              img2pdf.py:1371
    1 rotation = 0°                                                                                      img2pdf.py:1421
    1 input colorspace = RGB                                                                             img2pdf.py:1455
    1 width x height = 3307px x 4678px                                                                   img2pdf.py:1508
    1 read_images() embeds a JPEG                                                                        img2pdf.py:1868
    1 convert done                                                                                      _pipeline.py:745
    1 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '-l', 'eng',                          __init__.py:133
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\000001_ocr.png',
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\000001_ocr_hocr', 'hocr', 'txt']
    1 pikepdf.Matrix(0.18, 0, 0, -0.18, 0, 631.44)                                                          _hocr.py:203
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 824, 179)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 373)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1954, 386)                                                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 42, 530)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 656)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 41, 734)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, -0.00699983, 0.00699983, 0.999976, 40, 940)                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 39, 1019)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 34, 1099)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1155)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, -0.0109993, 0.0109993, 0.99994, 33, 1342)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1420)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1500)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1298, 1613)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1695)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1905)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1985)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999982, 0.00599989, -0.00599989, 0.999982, 35, 2065)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2160)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2242)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 2471)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 37, 2553)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, 0.00699983, -0.00699983, 0.999976, 37, 2640)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, 0.0109993, -0.0109993, 0.99994, 34, 2722)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999916, 0.0129989, -0.0129989, 0.999916, 40, 3033)                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999988, 0.00499994, -0.00499994, 0.999988, 42, 3113)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3191)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3225)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 58, 3256)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 60, 3288)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 49, 3317)                                                                  _hocr.py:323
    1 Emplacement update                                                                                   _graft.py:123
    1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    1 Grafting                                                                                             _graft.py:251
    1 Grafting with ctm pikepdf.Matrix(1.33414, 0, 0, 1.33352, 0, -5.68434e-14)                            _graft.py:294
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Postprocessing...                                                                                             ocr.py:144
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
XrefExt(xref=11, ext='.png')                                                                             optimize.py:347
Optimizable images: JPEGs: 0 PNGs: 1                                                                     optimize.py:352
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
xref 11: marking this JPEG as deflatable                                                                 optimize.py:547
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
xref 11: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                  optimize.py:98
Optimizable images: JBIG2 groups: 0                                                                      optimize.py:363
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Image optimization ratio: 1.21 savings: 17.4%                                                           _pipeline.py:989
Total file size ratio: 0.87 savings: -15.1%                                                             _pipeline.py:992
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.dq3qh7v_\optimize.pdf -> gs_ocr.pdf                      _pipeline.py:1064
@user1823 user1823 added the triage Issue needs triage label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Issue needs triage
Projects
None yet
Development

No branches or pull requests

2 participants