-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40847: Enable ingest technotes (technote.lsst.io - type documents) #150
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
jonathansick
commented
Sep 22, 2023
- Adds a new classification path that identifies a technote based on the generator metadata tag.
- Adds a new ingest service that consumes the content and maps metadata from the technote's HTML document.
Refresh all dependencies and ensure that no pydantic 2 dependency is added. This involves pinning safir < 5, and also dataclasses-avroschema < 0.51.0 where they began a migration towards pydantic 2 support (and that migration appears to be broken with respect to the revised import for AvroBaseModel?).
This type denotes a Technote (i.e. technote.lsst.io) based technical note.
This adds a new method of document classification that first attempts to download the document and inspect metadata tags in it. Specifically we can identify technotes based on their generator meta tag.
I'm not sure why, but when creating a ProcessContext for a standalone factory for service testing I found that the http client created by the http_client_dependency was closed. Creating the http client insitu of the ProcessContext solves this problem. I'm not sure why, and could be worth more investigation. Also fixed a bug where the kafka producer, schema manager, and algolia client were being retrieved from dependencies to close them rather than referencing the attributes already in ProcessContext.
respx enables mocking httpx responses.
- Update iter_sphinx_sections to work with html section tags that are used in the new technote format - Add LtdTechnote domain model; make it slightly different than the others in that it now produces the Algolia records rather than the service. - Add a TechnoteIngestService that ingests the new technotes - Test using the SQR-075 document.
jonathansick
force-pushed
the
tickets/DM-40847
branch
from
September 25, 2023 21:13
dc0d05f
to
5aab88c
Compare
Since the DocumentSourceType enum changed, we need new schemas.
jonathansick
force-pushed
the
tickets/DM-40847
branch
from
September 26, 2023 18:48
92ec8dc
to
7a2334f
Compare
Make the format of the technote ingest log message for when the upload is finished the same as the other document types.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.