Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix pdf text box mapping #328

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fukudasjp
Copy link
Contributor

@fukudasjp fukudasjp commented Mar 28, 2022

What do these changes do/fix?

This PR improve PDF highlighting.

  1. 9b82647 In some PDF, highlight location was wrong like following. In the image, the highlighted 10h00 should be right below the 18h00, but it isn't. This is because I improved highlighting by giving priority to longer matches in fix: fix and improve PDF highlighting #313.

screenshot 2022-03-28 22 28 49

The text box mapping logic manages text positions where text fragments are mapped recently. The positions are used to generate longer chunk of matches. But with the improvement, the mapping process is repeated. So, the logic tried to generate longer chunk based on the position information form the previous iteration. It made the wrong highlight location.

So, I fixed the logic to reset the position information after each iteration.
screenshot 2022-03-28 22 30 46

  1. a0a3324 Sort TextContentItems from PDF by the order of bboxes in HTML.

In the mapping, longer match between field text and text form PDF makes highlight more accurate. But in some PDF, its text content items may not be ordered by text order. It produces many small mappings instead of a long mapping. So, this is to reorder the text content items using the text order in HTML field.

How do you test/verify these changes?

  • Test case added for 1
  • Use Storybook to check if the change doesn't break the PDF highlighting

Have you documented your changes (if necessary)?

Are there any breaking changes included in this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant