Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace function-local dict media type -> model relationship configurations with static module dicts or an otherwise shared approach #4553

Open
sarayourfriend opened this issue Jun 26, 2024 · 2 comments
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Collaborator

Current Situation

See this thread for context: #4544 (comment)

This pattern exists widely in our code, and does not need to, and has the following current and potential problems:

  • The pattern encourages a local definition of static, global relationships between entities (in the case of the example, the media type and a decision model for that media type). These relationships are not local relationships, and should be defined once in a single place.
  • Creating a dict only to immediately retrieve a key from it, and then throw the dict away, is potentially costly in the context of a tight loop. That isn't a problem in this particular example, but the pattern is very easily accidentally used in this way, and should under no circumstances be used indiscriminately, especially in code paths that will inevitably inform other code.
  • The pattern is relatively obscure compared to an if/else or match/case or statically defined dict. While it is compact, compactness is not by itself a virtue in code, and certainly not in Python which actively resists compactness.

Suggested Improvement

Either:

  • Define the relationship between these objects in a single dict once in a shared location to be referenced by all places that care about this relationship; or
  • Ensure the relationships can be derived directly from the models themselves, either via class-property introspection or by renaming the fields (or aliasing them) so that they can easily be retrieved generically from any media object regardless if the media type of that object

To me the latter is preferred because it creates a much cleaner and more consistent code, without the need to manually maintain dicts of these relationships at all.

Benefit

Avoid the anti-pattern qualities of the function-local inline dict approach.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Jun 26, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Jun 26, 2024
@sarayourfriend sarayourfriend self-assigned this Jun 26, 2024
@openverse-bot openverse-bot moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Jun 26, 2024
@dhruvkb
Copy link
Member

dhruvkb commented Jun 26, 2024

Adding a note here that the approach described in suggested improvement point 1 would be similar to this test fixture.

MEDIA_TYPE_CONFIGS = {
"image": MediaTypeConfig(
media_type="image",
url_prefix="images",
origin_index="image",
filtered_index="image-filtered",
model_factory=model_factories.ImageFactory,
model_class=Image,
sensitive_factory=model_factories.SensitiveImageFactory,
search_request_serializer=ImageSearchRequestSerializer,
model_serializer=ImageSerializer,
report_serializer=ImageReportRequestSerializer,
report_factory=model_factories.ImageReportFactory,
sensitive_class=SensitiveImage,
deleted_class=DeletedImage,
providers=("flickr", "stocksnap"),
categories=("photograph",),
tags=("cat", "Cat"),
q="dog",
),
"audio": MediaTypeConfig(
media_type="audio",
url_prefix="audio",
origin_index="audio",
filtered_index="audio-filtered",
model_factory=model_factories.AudioFactory,
model_class=Audio,
sensitive_factory=model_factories.SensitiveAudioFactory,
search_request_serializer=AudioSearchRequestSerializer,
model_serializer=AudioSerializer,
report_serializer=AudioReportRequestSerializer,
report_factory=model_factories.AudioReportFactory,
sensitive_class=SensitiveAudio,
deleted_class=DeletedAudio,
providers=("freesound", "jamendo", "wikimedia_audio", "ccmixter"),
categories=("music", "pronunciation"),
tags=("cat",),
q="love",
),
}

I would also favour the second approach, but that's because I expect a number of circular dependency issues to come up using the first approach.

@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Jul 4, 2024
@sarayourfriend sarayourfriend removed their assignment Aug 13, 2024
@sarayourfriend sarayourfriend moved this from 🏗 In Progress to 📋 Backlog in Openverse Backlog Aug 13, 2024
@sarayourfriend sarayourfriend moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Aug 13, 2024
@sarayourfriend
Copy link
Collaborator Author

I'm unassigning myself this for now and closing #4599.

I need to focus on some other work at the moment and do not have time to rebase and get that PR reviewable. There is some good feedback and discussion in the PR, and @krysal shared a preference for the cached property approach, though I personally don't see how it's that much less meta-programming than the metaclass approach, and it does not completely solve the issue of fiddly configuration required on all of these support models, namely the related fields, which are arguably one of the most important things to get right here, and the most annoying to deal with when they aren't. In fact, if just the related names were unified, at least you can get the model from most of these relations on the class.

I think the meta-class approach works well, and don't fully understand the trepidation towards hooking into Django in this way, but do understand that not everyone is comfortable with this kind of approach and would prefer verbose manual configuration of these generic relations rather than programatically handling it as I've done in the PR.

Regardless, I'm happy to review and give help if someone else takes this up and solves it. Hopefully it's clear from the PR how much of an improvement it is in all the code working on these relationships.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Status: 📅 To Do
Development

Successfully merging a pull request may close this issue.

2 participants