-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure Documents to support bulk embedding #87
Conversation
341d1eb
to
8bf89c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great! Added a couple questions/nitpicks
) -> AsyncGenerator[models.Model, None]: | ||
"""A copy of `bulk_from_documents`, but async""" | ||
# Force evaluate generators to allow value to be reused | ||
documents = tuple(documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there circumstances where evaluating this could cause a query, and thus fail with a SynchronousOnlyOperation
- eg if passed a queryset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right - I've changed these methods to accept Sequences to prevent the need to force evaluate. You could still conceivably pass a Sequence that can't be evaluated in an async context but I'm not sure if there's a way to guard against that - any ideas welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Honestly I can't think of a great way to impose that restriction either - ie get rid of the chance of .only
etc deferring fields we want - without forcing it to no longer be a sequence of documents, and instead plain dataclasses... which seems like overkill.
@@ -5,7 +5,7 @@ | |||
from django.db.models import Q | |||
|
|||
|
|||
class DocumentQuerySet(models.QuerySet): | |||
class DocumentManager(models.Manager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the QuerySet derived manager be preferable here for allowing chaining custom operations + default queryset operations regardless of order? Not a big deal, but I'm not sure I get this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely preferable but I ran in to typing issues using the QuerySet
derived manager.
I think this is resolved in django-stubs by typeddjango/django-stubs#738, but not in django-types which we're using in this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this back to be a QuerySet derived manager and added a workaround for the typing issue.
Added a workaround for typing issues with django-types
…aluate generators
Forgive the large PR here - this started out as me implementing bulk embedding and then going down a spiral of refactoring things because the existing structure made bulk operations quite difficult.
This change refactors how Documents are generated and used in order to:
EmbeddableFieldsDocumentConverter
currently depends on acontent_type
field on base model #73)In more detail:
Document
dataclass that was converted back and forth toEmbedding
model instances, there is now only aDocument
Django model.Documents
represent a chunk of something to be stored/queried on by a storage provider.DocumentConverters
are responsible for converting something to Documents, and Documents back to their relevant something.Document
no longer has references to objects or content types - each document can have list ofobject_keys
which are an arbitrary representation of the thing it relates to. ADocumentConverter
knows how to both generate keys for the object it's working with, and retrieve a relevant object based on a key.ModelDocumentConverter
, this can createDocuments
based on aModelKey
, and (relatively) efficiently get all Django models that correspond to a list ofDocuments
.DocumentConverters
are now composed of separate operators that can be individually specified/overridden to make it easier to test and reason about each operation a converter is involved with.