Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update html and pdf converters to handle ByteStream(s) #5999

Closed
wants to merge 6 commits into from

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Oct 8, 2023

Why:

Extend the input handling capabilities of HTMLToDocument and PyPDFToDocument to process ByteStream inputs, thus enhancing the flexibility and utility of these converters.

What:

  • Updated HTMLToDocument to handle ByteStream input.
  • Updated PyPDFToDocument to handle ByteStream input.

How Did You Test It:

Proper testing was performed including unit and integration tests to validate the functionality of the updated input handling mechanism. Manual testing was also conducted to ensure ByteStream inputs were processed accurately.

Notes for Reviewer:

  • Would appreciate a review on the updated input handling logic to ensure it aligns with project standards and efficiently handles ByteStream inputs without any glitches.

@vblagoje vblagoje requested a review from a team as a code owner October 8, 2023 10:42
@vblagoje vblagoje requested review from julian-risch and removed request for a team October 8, 2023 10:42
@vblagoje vblagoje added ignore-for-release-notes PRs with this flag won't be included in the release notes. 2.x Related to Haystack v2.0 labels Oct 8, 2023
@github-actions github-actions bot added the type:documentation Improvements on the docs label Oct 8, 2023
@vblagoje
Copy link
Member Author

vblagoje commented Oct 9, 2023

@ZanSara any idea why import for pypdf fails? It worked before...

@vblagoje vblagoje requested a review from ZanSara October 10, 2023 13:47
@vblagoje
Copy link
Member Author

@ZanSara is volunteering for this one, as you @julian-risch are already working on LCF

@vblagoje
Copy link
Member Author

Closing, superseded by #6020 and #6021

@vblagoje vblagoje closed this Oct 10, 2023
@masci masci deleted the update_converters branch February 5, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant