Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reading multiple pdf files with a single PDFParser object #371

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nicolabaesso
Copy link

Elements changed:

  1. Added new test case in a separate file
  2. Added the two example PDFs
  3. Add the reset of the pages array when the data variable is null

I've added this elements because in my corporate job we are using this library, and recreating everytime the PDFParser object is not something I'm a fan of.
Other test cases are not failing, so no regressions.

pdfparser.js Outdated Show resolved Hide resolved
Copy link
Owner

@modesty modesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding more tests. A few thoughts on making the instance of PDFJSClass reusable:

  1. pdfparser instance (or the client that instantiates PDFJSClass) needs to be reset/reusable whenever PDFParser is created. (line 107 of pdfparser.js)
  2. lib/pdf.js: setting this.pages=[] is not sufficient to dispose the object, pdfDocument and rawTextContents needs reset too. Recomment to call existing destroy method.

@nicolabaesso
Copy link
Author

Hi @modesty,
thank you for your review. As suggested, I removed the this.pages=[] line and instead called the already available destroy() function.
Also I've added a function to reset the PDFJS object, is this what you were mentioning? Otherwise let me know.
I've also removed the if in line 120 of pdfparser.js, it was a leftover of one test I was doing to understand the code.

@nicolabaesso
Copy link
Author

Hi @modesty,
sorry for the pressure, can you give me a feedback on the code? The next days I could make some changes if something is still wrong (I feel like the function for resetting the PDFJS object could use some more work, but I would like to have your opinion).

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants