Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: OCR remove signatures before proceeding #1493

Closed
1 task done
topoldo opened this issue Jun 20, 2024 · 2 comments
Closed
1 task done

[Bug]: OCR remove signatures before proceeding #1493

topoldo opened this issue Jun 20, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@topoldo
Copy link

topoldo commented Jun 20, 2024

The Problem

I have a pdf test file which I attach here (test.pdf)
test.pdf
which I submitted to use Stirling-PDF Cleanup Scans/OCR function.
In the attached file option.png you find the options I adopted.
Option

Version of Stirling-PDF

0.26.1 - Docker on a Synology device

Last Working Version of Stirling-PDF

None tested before

Page Where the Problem Occurred

http://192.168.1.50:8080/ocr-pdf?lang=en_US [just to see the internal URL I used]

Docker Configuration

No response

Relevant Log Output

ERROR
---------
Internal Server Error:java.io.IOException: Command process failed with exit code 2.Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1 DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version'] DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Found gs 10.2.1 DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs'] DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat). [DS] Device[1] 0:(null) score is 1.146822 [DS] Selected Device[1]: "(null)" (Native) List of available languages in "/usr/share/tessdata/" (3): eng ita lat DEBUG ocrmypdf.helpers - pikepdf mmap enabled DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin) DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf) DEBUG root - Gathering info with 1 thread workers DEBUG ocrmypdf.helpers - pikepdf mmap enabled ERROR ocrmypdf._pipelines._common - ExitCodeException Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler return fn(options, plugin_manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline validate_pdfinfo_options(context) File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options raise DigitalSignatureError() ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document, invalidating the signature. 

STACK TRACE
------------
java.io.IOException: Command process failed with exit code 2. Error message:   DEBUG ocrmypdf - ocrmypdf 16.1.1
  DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version']
  DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Found gs 10.2.1
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
  DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 1.146822
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (3):
eng
ita
lat

  DEBUG ocrmypdf.helpers - pikepdf mmap enabled
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin)
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf)
  DEBUG root - Gathering info with 1 thread workers
  DEBUG ocrmypdf.helpers - pikepdf mmap enabled

  ERROR ocrmypdf._pipelines._common - ExitCodeException
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline
    validate_pdfinfo_options(context)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options
    raise DigitalSignatureError()
ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.

	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
	at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
	at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:148)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
	at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
	at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:61)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
	at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
	at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
	at org.eclipse.jetty.server.Server.handle(Server.java:179)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.
@Frooodle
Copy link
Member

issue is

Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.

looks like a duplicate of issue resolved for PDFA conversion #1360

Will add same solution, thanks for raising

@Frooodle Frooodle self-assigned this Aug 3, 2024
@Frooodle Frooodle added the enhancement New feature or request label Aug 3, 2024
@Frooodle Frooodle changed the title [Bug]: Problem using Cleanup Scans/OCR [Bug]: OCR remove signatures before proceeding Aug 3, 2024
@Frooodle
Copy link
Member

Frooodle commented Jan 9, 2025

We no longer use ocrmypdf

@Frooodle Frooodle closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants