Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge recent HTTPX-related fixes into master #114

Merged
merged 6 commits into from
Sep 26, 2024
Merged

Merge recent HTTPX-related fixes into master #114

merged 6 commits into from
Sep 26, 2024

Conversation

Criamos
Copy link
Contributor

@Criamos Criamos commented Sep 26, 2024

This PR is a hotfix for httpx.ReadErrors which occurred in es_connector.
For details, please check PR #113 and its commit messages.

- to mitigate "httpx.ReadError"s upon dropped or reset HTTP connections (to / from the edu-sharing repository), the edu-sharing connector will use a shared "requests.Session"-object from now on

Background:
- "httpx.ReadError"s were observed for HTTP Requests that contained (potentially huge) payloads, especially during Thumbnail uploads (via set_node_preview()) and fulltext uploads (via set_node_text())
  - since we cannot reasonably limit the size of the uploaded data, switching back to "requests" to handle these requests (hopefully more graceful than httpx) should fix these HTTP Connection Pool issues
  - the httpx discussion at encode/httpx#3067 pointed towards similar errors which users observed for payloads above 1 MiB

PS: Thank you, Constantin (@bergatco) and Paul, for the collective debugging session!
- when using the 'resetVersion=true' Spider Argument, logging messages did not correctly reflect what was happening during the hash check in the EduSharingCheckPipeline
- LomBase now stores a custom_setting key ("EDU_SHARING_FORCE_UPDATE"), which can be accessed via "spider.custom_settings" for later readouts
- if an active 'resetVersion' or 'forceUpdate' setting was detected, the pipeline's debug message should be easier to understand that even though an item's hash hasn't changed, the item will get updated nonetheless
- after refactoring LomBase, some method calls in merlin_spider were missing awaits and async declarations
- fix ValidationError for getHash() method:
   - getHash used to submit an 'int'-value to the edu-sharing API, but the API actually expects a string value
- optimized imports

PS: thanks Constantin (@bergatco) for the debug logs!
Fix "httpx"-related ReadErrors in `es_connector`
@Criamos Criamos added the bug Something isn't working label Sep 26, 2024
@Criamos Criamos self-assigned this Sep 26, 2024
@Criamos Criamos merged commit 8e43c33 into master Sep 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant