Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acronyms synchronisation with LayoutTokens #69

Closed
lfoppiano opened this issue Mar 9, 2018 · 5 comments
Closed

Acronyms synchronisation with LayoutTokens #69

lfoppiano opened this issue Mar 9, 2018 · 5 comments
Labels
Milestone

Comments

@lfoppiano
Copy link
Collaborator

There are few glitches with acronyms, when propagated in the whole text

  • they are not synchronised with LayoutTokens (since the propagation is done on the reconstructed string - the layout token matching is lost)
  • they have no coordinate position and, in some cases,
  • the offsets are not matching the correct position, when the stream of text comes from a sub-section of a recognised part of pdf (same issue for wikipedia matches Incorrect positions on disambiguation result of pdf files #67)
@lfoppiano lfoppiano added the bug label Mar 9, 2018
@lfoppiano lfoppiano added this to the 0.0.3 milestone Mar 9, 2018
@lfoppiano lfoppiano changed the title Acronyms de-synchronisation with LayoutTokens Acronyms synchronisation with LayoutTokens Mar 9, 2018
@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Mar 13, 2018

I'm adding the document used for testing:
su2015relationship.pdf

This article looks a complicated example... currently it doesn't seems to work properly

@tantikristanti
Copy link
Collaborator

Another file to be tested
TestNerd3.pdf

@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Mar 14, 2018

Article about SVM: 1-s2.0-S0031320307001409-main.pdf

@tantikristanti
Copy link
Collaborator

Other articles for testing this issue
ADSLSystemEnglish.pdf
SEOGoogleFrancais.pdf

@tantikristanti
Copy link
Collaborator

This issue is closed after correcting some bugs in propagateAcronym method and after passing several unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants