Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When converting to text the first line don't follow layout when enabled #612

Open
3 tasks done
wendy0402 opened this issue Dec 20, 2024 · 1 comment
Open
3 tasks done
Labels
bug Something isn't working

Comments

@wendy0402
Copy link

Prerequisites

  • I have written a descriptive issue title

  • I have searched existing issues to ensure it has not already been reported

  • I agree to follow the Code of Conduct that this project adheres to

API/app/plugin version

7.2.2

Node.js version

20.14.0

Operating system

macOS

Operating system version (i.e. 20.04, 11.3, 10)

Sonoma(14.6.1)

Description

First of all thank you for this awesome library!

When I convert pdf to text while maintain the layout, I realise the first line of the page disrespect the layout. Seems because you trim the poppler output after receiving the response form poppler https://github.com/Fdawgs/node-poppler/blob/main/src/index.js#L1533

Steps to Reproduce

TextAlignCenter.pdf
the result of parsing the pdf by using poppler directly on command line

         WALDEN

        BY
HENRY DAVID THOREAU




              Here we have
         some centered text lines
          with background color
 "fillc:#3277d3, bgcol:#beded9, rot:0"

/// truncated because too long

from

const { Poppler } = require("node-poppler");
const pdf = new Poppler();
const output = await this.poppler.pdfToText(file, undefined, { maintainLayout: true });

output

WALDEN

        BY
HENRY DAVID THOREAU




              Here we have
         some centered text lines
          with background color
 "fillc:#3277d3, bgcol:#beded9, rot:0"




                 1854
                                      94
// truncated because too long
````

### Expected Behaviour

expect the result to be:
     WALDEN

    BY

HENRY DAVID THOREAU

          Here we have
     some centered text lines
      with background color

"fillc:#3277d3, bgcol:#beded9, rot:0"

/// truncated because too long

@wendy0402 wendy0402 added the bug Something isn't working label Dec 20, 2024
@Fdawgs
Copy link
Owner

Fdawgs commented Dec 23, 2024

Good spot @wendy0402, thanks for raising this! I'll take a look after the holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants