Skip to content

MSC4273: Approve and Disapprove ratings for moderation policies #4273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 195 additions & 0 deletions proposals/4273-approval-disapproval.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client/bot sending approvals
  • Client/bot using approvals

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overarching question: does this need to be a protocol feature? In the early design of policy lists there was a theory of "multiple confirmations" and manual approval, where the bot wouldn't honour the policy list recommendations until either multiple lists added the same rule, or the operator accepted it.

This is also why the spec doesn't say that recommendations must be followed: the lists are just sources of information. What the parser/receiver does with that information is entirely left to the implementation.

Copy link
Contributor Author

@Gnuxie Gnuxie Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be an overarching question, but a question you have about the proposal?

I am unaware of the early work around moderation policy lists. This proposal has been on the cards for a long time now and there is a lot of potential for this proposal to be used beyond moderation purposes as it is generic.

This does need to be documented somewhere for the purpose of interoperability between moderation tools and matrix clients, and the MSC process currently is the best avenue for that. It is also my intent for this proposal to land within the specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also why the spec doesn't say that recommendations must be followed: the lists are just sources of information. What the parser/receiver does with that information is entirely left to the implementation.

Likewise, they are free to do what they want with approval ratings in this proposal

Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# MSC4273: Approve and Disapprove ratings for moderation policies

Currently, when watching a moderation policy list, there is no way to express
approval or disapproval of certain policies.

Typically, Matrix moderation tooling requires that moderators accept all
policies within a moderation policy list in an all-or-nothing approach. We call
this profile of policy list subscription _direct propagation_: any policy that
appears on a watched list with _direct propagation_ semantics are directly
considered and actioned by tooling. Examples of this include Mjonlir & Draupnir.

For example: `charity`, the moderator of `cat-community.example.com`, cannot
subscrbe to the list, `#bat-coc-bl:bat-community.example.org` without accepting
all of the existing policies. This is problematic because `charity`'s friend
`bob` was banned from the list for arguing with a moderator. `charity` respects
the ban in the bat community, but still needs to be able to interact with their
friend.

Addtionaly, this all-or-nothing approach increases the amount of trust that
moderators place onto the curators of moderation policy lists. As watching a
list gives the list curators the ability to manage members within the
subscribing community.

For example: `luna`, the moderator of `bat-community.example.com` notices that
`cat-community.example.com` is watching the list, and uses their power to ban
more of `charity`'s friends.
Comment on lines +12 to +26
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepbluev7 says to just use "mine" and "third party" rather than all these names.


These problems lead to situations where communities avoid watching other
moderation policy lists entirely, either because of a few policies that they
take issue with, or it is simply unsafe to do so because of the level of trust
placed on the curators of a list when watching them.

## Proposal

We introduce a new moderation policy meta rating event type,
`m.policy.rule.approval`. This policy exists purely for moderators to express
their approval or disapproval for policies in other lists.

Content Schema:

```yaml
properties:
rating:
description: |-
A binary rating of either string literals approve or disapprove.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "binary" refer to here?

type: string
enum:
- approve
- disapprove
Comment on lines +44 to +49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spaces (or tabs) got slightly out of order here.

Suggested change
description: |-
A binary rating of either string literals approve or disapprove.
type: string
enum:
- approve
- disapprove
description: |-
A binary rating of either string literals approve or disapprove.
type: string
enum:
- approve
- disapprove

event_id:
type: string
description: |-
The event ID of the moderation policy that this is a rating for.
example: "$some_event_id"
Comment on lines +50 to +54
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @deepbluev7

We should instead use the room_id, the event type and the event state_key. As the state_key + type combination are used to identify a policy within moderation policy lists so that they can be updated or removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, without the event_id or entity the policy can be modified after the fact once agreement has been met to target a different entity that you disagree with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An approach to this in implementation could be simply to notify that a policy has changed, and ask again for a new rating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the MSC with that recommendation.

Comment on lines +50 to +54
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if i want to mark other entities as approved or disapproved? Such as a room or a user?

type: object
required:
- rating
- event_id
```

Example:

```json
{
"sender": "@luna:bat-community.example.com",
"type": "m.policy.rule.approval",
"content": {
"rating": "approve",
"event_id": "$wioefjwoijefo:example.com"
},
"state_key": "lciRM4XTjq9bj3QMawAbXFH7pCRJEvft34ZmhGBNsxc="
}
```

### Discussion

These policies allow for moderation tools to develop _approval
only_[^approval-only] policy list subscription modes. Whereby policies are only
considered from a list when they have been approved by a moderator. Please note,
that due to the _lazy_ nature of matching, the workload on moderators for
approving policies when watching a list will be minimal[^minimal-workload].

Additionally, subscriptions that use _direct propagation_ can remove policies
from consideration that have been _disapproved_ by a moderator.

[^approval-only]: https://github.com/the-draupnir-project/planning/issues/4

[^minimal-workload]:
They will only have to approve policies that are matching encountered
entities. For example, there is no need to approve a policy banning a user
that has not been encountered yet as they have not attempted to join any
known room.

The examples given in this proposal have so far only considered the application
of these ratings to the room level. However In the future, if policy lists are
considered for consumption within clients for the purpose of blocking on a user
level, these ratings provide a fundamental primitive for a subjective
distributed reputation system.

## Potential issues

### Direct propagation of approval ratings

If moderation tools implement the proposal naively, it's possible that approval
ratings themselves can be considered alongside normal moderation policies. For
example, `charity` is watching the list `#bat-coc-bl:bat-community.example.com`
curated by `luna` with a _direct propagation_ policy list subscription profile.
`charity` is also watching the list `#people-i-hate:bat-community.example.com`
with an _approval only_ subscription profile. `luna` notices this, and approves
policies from `#people-i-hate:bat-community.example.com` within the
`#bat-coc-bl:bat-community.example.com` list. If `charty`'s tooling naively
propagates the approval ratings, then their tool will enact policies from
`#people-i-hate:bat-community.example.com`.

#### Mitigation

To prevent this, implementations SHOULD use an allowlist of senders when
considering `m.policy.rule.approval` ratings by default, rather than the
specifics of a policy list subscription profile.

## Alternatives

### Local storage

Approval ratings could be kept on local storage rather than in policy rooms.
However, this would reduce the potential for interoperation of different tools
and clients.

### Anti-ban

Moderation bots can use anti-ban moderation policies that can make entities
immune to other policies with the `ban` recommendation.
[Meowlnir](https://github.com/maunium/meowlnir) currently implements a form of
anti-ban via "unban" policies, so that if an unban policy is found before a ban
policy, the unban takes precedence and the ban is ignored. This has the benefit
of not requiring an existing policy to disapprove of. And their effect remains
regardless of duplicate `ban` policies.

#### Assessment

The anti-ban approach is specific to the `ban` recommendation, whereas
approve/disapprove ratings are generic to any policy type. The
approve/disapprove ratings are also meant to be future proof and ready for
consumption by recommender systems due to their generic nature.

Anti-ban also requires careful consideration in policy list subscription
profiles, as naive implementation provides a vector for entities to become
immune to any future action unless the targeted list is unwatched or the
offending policy is revoked.

Anti-ban policies also cannot express disapproval of content within a policy,
such as a specific reason or context for the policy.

### Copy propagation for approval

For the positive approval use case, bots could watch a policy list without
applying any policy from it. Then to approve policies, they can be copied over
to a list where policies applied.
[Meowlnir](https://github.com/maunium/meowlnir) has been considering this
strategy.

#### Assessment

Copy propagation of policies breaks the audit chain and removes them from their
original context. For example, copying a ban would remove the from the original
list and the sender. The community that copies the ban likely will not have any
information surrounding the circumstances of the ban beyond the data contained
within the copy of policy.

If the original sender rescinds or revokes the ban, then the audit chain is
broken, because without additional consideration, the copies will not be
removed.

This is extremely problematic, especially in the context of social networks such
as Matrix and the Fediverse. Bans are rescinded because of mistakes,
forgiveness, changing circumstances, new information, and age all of the time.
And yet on networks that follow copy propagation, bans are often permanent and
indefinite. Even after explicit retraction. Moderators are incentivised to keep
copies active out of an abundance of caution and to create less work.

For this reason, we strongly recommend against using copy propgation not just as
a strategy for approval, but in all circumstances.

## Security considerations

See ptoential issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See ptoential issues.
See potential issues.


## Unstable prefix

The event type `m.policy.rule.approval` will use the unstable type
`org.matrix.msc4273.approval`.

## Dependencies

None