Improved Exception Handling during website-screenshot fallback and several fixes for pydantic
ValidationErrors
#115
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR includes the following changes:
sodix_spider
) under two circumstances:thumbnail
URL, therefore the pipeline tried to fall back to a website screenshotlocation
(cclom:location
) would point towards a binary file (e.g..mp3
or.mp4
), the headless browser wouldn't be able to render / take a website screenshot either, causing anAttributeError
Exception when trying to access thescreenshot_bytes
-object (expected type:bytes
, but receivedNone
)location
pydantic
ValidationError
s during POST requests to edu-sharingduration
:int
values are wrapped in a string before submitting the itemBaseItem.hash
: making sure that (old) crawlers which returned anint
-value ingetHash()
are typecast tostr
before submitting the itemaggregationLevel
:int
values are wrapped in a string before submitting the itemlicense.internal
mapping for"NONPUBLIC"
license valuespipelines.py
PS: Thank you, Constantin, for the useful debug logs!