Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PDF to Word produces single page file with mangled contents #2944

Closed
1 task done
arimmington opened this issue Feb 13, 2025 · 7 comments
Closed
1 task done

Comments

@arimmington
Copy link

Installation Method

Local Installation

The Problem

Whenever I attempt to produce a DOC/DOCX/ODT file from a pdf, it fails in some manner.
The output file is always a single page, and...

  • sometimes blank
  • sometimes with a single page of the input file done perfectly but everything else is missing
  • sometimes with all of the contents of the input file jumbled up

I've tried with different source PDFs and made sure I have the correct LibreOffice packages installed.
Logs seem to indicate that the operation is succeeding, and there are no other errors during the operation.
Copying the command from the logs and running it manually from terminal produces the expected results...

Environment is a Debian 12 (bookworm) LXC on proxmox. 4 vCPU and 4GB RAM

Version of Stirling-PDF

0.41.0

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

http://localhost:8080/pdf-to-word

Docker Configuration

Relevant Log Output

Feb 13 14:24:35 stirling-pdf java[4282]: 14:24:35.495 [qtp1708723071-20] INFO  s.s.SPDF.utils.ProcessExecutor - Running command: soffice --headless --nologo --infilter=writer_pdf_import --convert-to docx --outdir /tmp/output_6506255918199947198 /tmp/input_2034791900830214237.pdf
Feb 13 14:24:51 stirling-pdf java[4282]: 14:24:51.898 [Thread-23] INFO  s.s.SPDF.utils.ProcessExecutor - convert /tmp/input_2034791900830214237.pdf -> /tmp/output_6506255918199947198/input_2034791900830214237.docx using filter : MS Word 2007 XML

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.
Copy link

dosubot bot commented Feb 13, 2025

Hey @arimmington! I'm a bot here to help you with bugs, questions, and contributions. Let's get this issue sorted out together.

A similar issue was reported in a closed thread, where users experienced errors when converting PDFs to DOC, DOCX, or ODT formats. Unfortunately, no solution was found in the comments of that thread [1].

Given that running the command manually from the terminal produces the expected results, it might be worth checking if there are any discrepancies between the environment variables or permissions when the command is executed by Stirling-PDF versus manually. Additionally, ensure that the version of LibreOffice used by Stirling-PDF is the same as the one used in your manual tests.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report

@arimmington
Copy link
Author

@dosu

The java process is run as root and I ran the command as root as well (I know I shouldn't but it's just for testing), so I think there shouldn't be any differences in the environmental variables. Similarly, since I'm doing everything as root and simply copying the command from the logs wouldn't that ensure the version of LibreOffice being used is the same?

I also want to clarify that there are no errors shown in the web-ui nor in the logs. Is there a way to increase log verbosity for sterling-pdf?

Copy link

dosubot bot commented Feb 13, 2025

Could you provide more details about the specific version of LibreOffice installed and any additional configurations you might have set up for Stirling-PDF? Also, have you tried running Stirling-PDF with increased resources to see if that affects the output?

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report

@arimmington
Copy link
Author

@dosu

LibreOffice version: LibreOffice 7.4.7.2 40(Build:2)
sterling-pdf has not been customized yet, nor anything else really.
I am currently running it as a service with systemd and added '-Xmx2G' to the execution string, where previously there was no declaration at all. The LXC has 4 vCPU and 4G RAM assigned.

All of this was the state before I submitted this report too.

Copy link

dosubot bot commented Feb 14, 2025

Hey @arimmington! I'm here to help with bugs, technical questions, and engineering knowledge, but it looks like your question is a bit outside my wheelhouse. If you think I should have been able to help with this, feel free to reach out to the Dosu Team. They'll be happy to hear your feedback!

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@stinger679
Copy link

i had the same problem on this version of LibreOffice
read about update on newest version for your OS
I did it on Debian 12 and get better result
My current version of Libreoffice is LibreOffice 24.8.4.2 480(Build:2)

@arimmington
Copy link
Author

@stinger679 Thank you! That solved it for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants