Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40847: Enable ingest technotes (technote.lsst.io - type documents) #150

Merged
merged 11 commits into from
Sep 26, 2023

Conversation

jonathansick
Copy link
Member

  • Adds a new classification path that identifies a technote based on the generator metadata tag.
  • Adds a new ingest service that consumes the content and maps metadata from the technote's HTML document.

Refresh all dependencies and ensure that no pydantic 2 dependency is
added. This involves pinning safir < 5, and also dataclasses-avroschema
< 0.51.0 where they began a migration towards pydantic 2 support (and
that migration appears to be broken with respect to the revised import
for AvroBaseModel?).
This type denotes a Technote (i.e. technote.lsst.io) based technical
note.
This adds a new method of document classification that first attempts to
download the document and inspect metadata tags in it. Specifically we
can identify technotes based on their generator meta tag.
I'm not sure why, but when creating a ProcessContext for a standalone
factory for service testing I found that the http client created by the
http_client_dependency was closed. Creating the http client insitu of
the ProcessContext solves this problem. I'm not sure why, and could be
worth more investigation.

Also fixed a bug where the kafka producer, schema manager, and algolia
client were being retrieved from dependencies to close them rather than
referencing the attributes already in ProcessContext.
respx enables mocking httpx responses.
- Update iter_sphinx_sections to work with html section tags that are
  used in the new technote format
- Add LtdTechnote domain model; make it slightly different than the
  others in that it now produces the Algolia records rather than the
  service.
- Add a TechnoteIngestService that ingests the new technotes
- Test using the SQR-075 document.
Since the DocumentSourceType enum changed, we need new schemas.
Make the format of the technote ingest log message for when the upload
is finished the same as the other document types.
@jonathansick jonathansick marked this pull request as ready for review September 26, 2023 20:53
@jonathansick jonathansick merged commit b6157e3 into main Sep 26, 2023
@jonathansick jonathansick deleted the tickets/DM-40847 branch September 26, 2023 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant