Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create sensitive and deleted media models for decisions #4544

Merged
merged 11 commits into from
Jun 26, 2024
12 changes: 10 additions & 2 deletions api/api/admin/media_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,10 @@ def decision_create_view(self, request, object_id):
if request.method == "POST":
media_obj = self.get_object(request, object_id)

through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
Comment on lines +447 to +450
Copy link
Collaborator

@sarayourfriend sarayourfriend Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but I've seen this pattern a few times in our code and I don't really understand it. Why not use match/case or if/else for this? The inline object approach is a little too clever (and I never understood defining a static object inline of a function body like this either).

Both match and if/else require explicitly raising if self.media_type doesn't match... but isn't that better? It's certainly easier to read and understand to me (and you know the old phrase about how often code is read compared to written, I'm sure).

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
match self.media_type:
case IMAGE:
through_model = ImageDecisionThrough
case AUDIO:
through_model = AudioDecisionThrough
case _:
raise ValueError(f"Unknown media type {self.media_type}")
Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
if self.media_type == IMAGE:
through_model = ImageDecisionThrough
elif self.media_type == AUDIO:
through_model = AudioDecisionThrough
else:
raise ValueError(f"Unknown media type {self.media_type}")

Even better would be to configure it on the admin itself, along with the media type.

Alternatively, if you want to remove all explicit configuration:

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
through_model = getattr(media_obj, f"{self.media_type}decisionthrough_set").model

But it really would be better if it was just a @property of media_obj._meta or something...

Suggested change
through_model = {
"image": ImageDecisionThrough,
"audio": AudioDecisionThrough,
}[self.media_type]
through_model = media_obj._meta.decision_through_model

Anyway, any of those would be expected and easier to understand when reading, I think. (Except the getattr one, that's similarly too clever and it's basically not even worth including as is, but would be improved if it didn't need gettatr and could just be media_obj.decisionthrough_set.model if the media type was removed from the name, which would simplify other code too, not just here).

Copy link
Member Author

@dhruvkb dhruvkb Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I write use the = {}[] pattern is because it is the most compact among all the alternatives and also raises an exception when none of the keys match the input.

I've done this a few times in this file itself. Would you prefer I change this pattern across the entire file, or keep this as is is in the interest of consistency, or just change it here to one of the alternatives you've suggested?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dict-key-access approach is preferred for any reason at all (it is nice it implicitly throws if the key doesn't exist), I'd at the very least think defining the dictionary outside the runtime scope of the function is a reasonable requirement, if only so that it's in a shared location. It's far-fetched to me to establish a pattern of defining otherwise static dictionaries, especially one encoding relationships between static objects, entirely dynamically in the runtime of a function. From a performance perspective it's fine here, but in a tight loop it's just silly, right? From a shared data/relationship encoding perspective (and discoverability, clarity, etc) it's definitely the worst option I can think of 😕. Just seems like an antipattern to me 🤷 I also don't think compactness is necessarily a virtue, and certainly not in Python, which actively resists compactness in my experience.

Copy link
Collaborator

@sarayourfriend sarayourfriend Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my request here, to clarify, is to move the dictionary to a static location, out of the function, or define the relationship in some other concrete manner that isn't specific to this function. That can be either: changing the field names on the models so that they can be referenced generically (without the media type prefix), or adding a class property to the base class that resolves these models for the media type based on the fields, or something else like that. Also, a follow-up issue to remove that pattern anywhere it's been added and replace it with the generic approach (whether that's statically defined dicts or the dynamic-but-shared approach of class properties).

In general: these static relationships between media type and the data models should not be defined within a local function context, even ignoring all issues with legibility, ergonomics, and performance of this local dict approach. At the very least, this static relationship should be defined statically, and in a shared location, so that new code automatically references it, and reducing the risk of someone just copy/pasting this function-local definition of the relationship.

The dict-accessor pattern is fine on its own, it's the inline dict definition I think is an anti-pattern (though I think match/case and an explicit raise of ValueError is clearer than KeyError, but that's an aesthetic judgement, I know, as at the end of the day it's more or less intelligible as the same underlying problem).

Edit: I realise I'm blocking this PR that fixes a bug in the admin on a code-style/quality issue. I do think this needs to change and believe it's an anti-pattern, but won't block the PR. I'll write an issue to address this more widely later today instead.

form = get_decision_form(self.media_type)(request.POST)
if form.is_valid():
decision = form.save(commit=False)
Expand All @@ -454,9 +458,13 @@ def decision_create_view(self, request, object_id):
moderator=request.user.get_username(),
)

decision.media_objs.add(media_obj)
through = through_model.objects.create(
decision=decision,
media_obj=media_obj,
)
logger.info(
"Media linked to decision",
"Through model created",
through=through.id,
decision=decision.id,
media_obj=media_obj.id,
)
Expand Down
6 changes: 5 additions & 1 deletion api/api/management/commands/backfillmoderationdecision.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,14 +92,18 @@ def handle(self, *args, **options):
for report, decision in zip(reports_chunk, decisions):
report.decision = decision
MediaReport.objects.bulk_update(reports_chunk, ["decision"])
MediaDecisionThrough.objects.bulk_create(
throughs = MediaDecisionThrough.objects.bulk_create(
[
MediaDecisionThrough(
media_obj_id=report.media_obj_id, decision=decision
)
for report, decision in zip(reports_chunk, decisions)
]
)
# Bulk create does not call ``save()`` so the mark-sensitive/deindex
# actions will not be undertaken.
for through in throughs:
through.perform_action()
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
t.update(1)

t.info(
Expand Down
4 changes: 4 additions & 0 deletions api/api/models/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,10 @@ class AudioDecisionThrough(AbstractMediaDecisionThrough):
be referenced by `identifier` rather than an arbitrary `id`.
"""

media_class = Audio
sensitive_media_class = SensitiveAudio
deleted_media_class = DeletedAudio

media_obj = models.ForeignKey(
Audio,
to_field="identifier",
Expand Down
4 changes: 4 additions & 0 deletions api/api/models/image.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,10 @@ class ImageDecisionThrough(AbstractMediaDecisionThrough):
be referenced by `identifier` rather than an arbitrary `id`.
"""

media_class = Image
sensitive_media_class = SensitiveImage
deleted_media_class = DeletedImage

media_obj = models.ForeignKey(
Image,
to_field="identifier",
Expand Down
22 changes: 20 additions & 2 deletions api/api/models/media.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,8 +335,6 @@ class AbstractMediaDecision(OpenLedgerModel):
class Meta:
abstract = True

# TODO: Implement ``clean`` and ``save``, if needed.


class AbstractMediaDecisionThrough(models.Model):
"""
Expand All @@ -348,6 +346,10 @@ class AbstractMediaDecisionThrough(models.Model):

media_class: type[models.Model] = None
"""the model class associated with this media type e.g. ``Image`` or ``Audio``"""
sensitive_media_class: type[models.Model] = None
"""the model class associated with this media type e.g. ``SensitiveImage`` or ``SensitiveAudio``"""
deleted_media_class: type[models.Model] = None
"""the model class associated with this media type e.g. ``DeletedImage`` or ``DeletedAudio``"""

media_obj = models.ForeignKey(
AbstractMedia,
Expand All @@ -361,6 +363,22 @@ class AbstractMediaDecisionThrough(models.Model):
class Meta:
abstract = True

def perform_action(self):
"""Perform the action specified in the decision."""

if self.decision.action in {
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
DecisionAction.DEINDEXED_SENSITIVE,
DecisionAction.DEINDEXED_COPYRIGHT,
}:
self.deleted_media_class.objects.create(media_obj=self.media_obj)

if self.decision.action == DecisionAction.MARKED_SENSITIVE:
self.sensitive_media_class.objects.create(media_obj=self.media_obj)
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved

def save(self, *args, **kwargs):
super().save(*args, **kwargs)
self.perform_action()


class PerformIndexUpdateMixin:
@property
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from io import StringIO
from unittest.mock import patch

from django.core.management import call_command

Expand Down Expand Up @@ -44,6 +45,7 @@ def make_reports(media_type, reason: str, status: str, count: int = 1):
return ImageReportFactory.create_batch(count, status=status, reason=reason)


@patch("django.conf.settings.ES")
@pytest.mark.parametrize(
("reason", "status", "expected_action"),
(
Expand All @@ -61,7 +63,7 @@ def make_reports(media_type, reason: str, status: str, count: int = 1):
@pytest.mark.parametrize(("media_type"), ("image", "audio"))
@pytest.mark.django_db
def test_create_moderation_decision_for_reports(
media_type, reason, status, expected_action
mock_es, media_type, reason, status, expected_action
):
username = "opener"
UserFactory.create(username=username)
Expand All @@ -78,12 +80,13 @@ def test_create_moderation_decision_for_reports(
assert f"Created 1 {media_type} moderation decisions from existing reports." in out

decision = MediaDecision.objects.first()
assert decision.media_objs.count() == 1
assert decision.action == expected_action
assert decision.moderator.username == username

assert MediaDecisionThrough.objects.filter(decision=decision).count() == 1

decision_through = MediaDecisionThrough.objects.first()
assert decision_through.media_obj == report.media_obj
assert str(decision_through.media_obj_id) == report.media_obj_id
dhruvkb marked this conversation as resolved.
Show resolved Hide resolved
assert decision_through.decision == decision


Expand Down