Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade document extraction #187

Closed
wants to merge 16 commits into from
Closed

Upgrade document extraction #187

wants to merge 16 commits into from

Commits on Apr 29, 2024

  1. feat(req): Add lxml_html_clean

    Beacuse of changes to LXML
    flooie committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    12bfaa3 View commit details
    Browse the repository at this point in the history
  2. feat(pdf): Add strip margin flag for PDF extraction

    Add pdfplumber as main tool for extracting text
    from a PDF - and add a strip margin flag to
    enable cropping out text in the margins
    and removing skewed text
    flooie committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    a91bc95 View commit details
    Browse the repository at this point in the history
  3. tests(extraction): Add and fix tests

    Added and fixed tests
    Modified one test pdf to better reflect the test
    flooie committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    0260927 View commit details
    Browse the repository at this point in the history
  4. chore(lint): Fix lint

    flooie committed Apr 29, 2024
    Configuration menu
    Copy the full SHA
    5cb3d1c View commit details
    Browse the repository at this point in the history

Commits on May 14, 2024

  1. Configuration menu
    Copy the full SHA
    8c87680 View commit details
    Browse the repository at this point in the history
  2. feat(tasks): Update extract from pdf

    Change extract from pdf to drop ocr available flag
    flooie committed May 14, 2024
    Configuration menu
    Copy the full SHA
    5ee15b5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9d52634 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    233a615 View commit details
    Browse the repository at this point in the history
  5. feat(tests): Update tests

    flooie committed May 14, 2024
    Configuration menu
    Copy the full SHA
    c070cb2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e3855f0 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6dd78f1 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2024

  1. Configuration menu
    Copy the full SHA
    6c0fef0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1e6a0c1 View commit details
    Browse the repository at this point in the history

Commits on May 16, 2024

  1. Configuration menu
    Copy the full SHA
    3a7666d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad55b20 View commit details
    Browse the repository at this point in the history
  3. feat(tasks): Rename imports

    flooie committed May 16, 2024
    Configuration menu
    Copy the full SHA
    06d26d0 View commit details
    Browse the repository at this point in the history