Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

Inform syncapi about holes in DAGs #1006

Open
kegsay opened this issue May 5, 2020 · 2 comments
Open

Inform syncapi about holes in DAGs #1006

kegsay opened this issue May 5, 2020 · 2 comments
Labels
C-Roomserver C-Sync-API T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Discussion We aren't really sure about this yet so let's talk about it some more

Comments

@kegsay
Copy link
Member

kegsay commented May 5, 2020

Problem context

Servers send other servers events. These events have prev_events. It's possible for a receiving server to be missing those prev_events, creating a hole in the DAG (aka an outlier). In an attempt to fill this hole, there's the API /get_missing_events which takes the latest event IDs and the earliest event IDs and gives you events walking back from latest and ignoring anything in earliest (don't be conned into thinking it returns events "between" the two lists, it doesn't have to in the case of forks).

In the happy case:

  • We receive a transaction with events whose prev_events we do not recognise.
  • We request them via /get_missing_events and the returned events fill in the hole in the DAG.
  • We process those missing events and then the event from the transaction.
  • We return 200 OK to the transaction.

If we cannot obtain the prev_events, we can request the /state of the room at the event and continue on.

There are many bad cases:

  • The server may be missing the prev_events or the requesting server may not be allowed to see those events.
  • The server may lie and say they do not know the prev_event, forcing the server to hit /state which then lies about the entire room state.

We can try to guard against lies by forcing the server who sent us the event to cough up the prev_events or else their transaction will be rejected.

In addition, the client needs to be informed of a new hole in the DAG, or else they will never hit /messages (and hence backfill) the hole, resulting in a gap in message history e.g due to lost connectivity on the server (this is exacerbated for p2p nodes). We need to send a limited sync to reset the client in this scenario.

@kegsay
Copy link
Member Author

kegsay commented May 5, 2020

The quick fix (which doesn't really fix everything):

  • On receiving a txn with missing prev_events, call /get_missing_events with limit=10 (synapse parity)
  • If those events fill the hole then fab, prepend them to the transaction and process away.

The proper fix:

  • The ability to reset a room from syncapis perspective (and that translating to a limited sync)
  • Moving the backwards extremity logic from syncapi to the roomserver so when we receive a QueryBackfill we can service from the roomserver db initially, then backfill when it hits a hole.
  • Modify the BFS logic in QueryBackfill to return the list of event IDs which are the furthest back it has from that event.
  • Hit /get_missing_events with those event IDs and the latest events of the main DAG

@kegsay
Copy link
Member Author

kegsay commented May 13, 2020

This is mostly resolved now, but:

  • We don't handle rejected events very well.
  • We need to tell the syncapi about holes.
  • We need the syncapi to reset clients sensibly so they can /messages.

@kegsay kegsay changed the title Handling missing events over federation Inform syncapi about holes in DAGs May 13, 2020
@kegsay kegsay added X-Needs-Discussion We aren't really sure about this yet so let's talk about it some more T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-Roomserver C-Sync-API T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Discussion We aren't really sure about this yet so let's talk about it some more
Projects
None yet
Development

No branches or pull requests

1 participant