Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from tombstoning deleted objects to actually deleting them #1703

Open
snarfed opened this issue Jan 19, 2025 · 7 comments
Open

Switch from tombstoning deleted objects to actually deleting them #1703

snarfed opened this issue Jan 19, 2025 · 7 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Jan 19, 2025

When we get a Delete, right now we set Object.deleted to True, and we have to check that flag everywhere. This originally seemed like a good idea, long ago when I was young and foolish, but I don't know that it's ever bought us anything, and it's a liability. We should switch to actually deleting.

@snarfed snarfed added the infra label Jan 19, 2025
@snarfed
Copy link
Owner Author

snarfed commented Jan 21, 2025

One catch: we can't delete Objects in Protocol.receive because send tasks need their copies. So we'd need to delay that delete, maybe with Object._expire.

@snarfed
Copy link
Owner Author

snarfed commented Jan 21, 2025

^ Done, via Object._expire.

I'm also running a workflow to delete all existing deleted Objects right now. 18M! Using https://cloud.google.com/dataflow/docs/guides/templates/provided/firestore-bulk-delete#gcloud , based on https://github.com/snarfed/bridgy#delete-old-responses :

gcloud dataflow jobs run 'Delete Object.deleted datastore entities' \
  --gcs-location gs://dataflow-templates-us-central1/latest/Firestore_to_Firestore_Delete \
  --region us-central1 \
  --parameters firestoreReadGqlQuery="SELECT __key__ FROM Object WHERE deleted = TRUE",firestoreReadProjectId=bridgy-federated,firestoreDeleteProjectId=bridgy-federated,firestoreHintNumWorkers=100

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2025

Next step, delete all old Objects (eg >90d) that should now be expired due to our new logic.

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2025

Doing this with the same dataflow template as above (note the ^.^ to set . as the parameter delimiter instead of , so that I can use it in the GQL query):

gcloud dataflow jobs run 'Delete activity Object datastore entities' \
  --gcs-location gs://dataflow-templates-us-central1/latest/Firestore_to_Firestore_Delete \
  --region us-central1 \
  --parameters ^.^firestoreReadGqlQuery="SELECT __key__ FROM Object WHERE type IN ARRAY ('post', 'update', 'delete', 'accept', 'block', 'flag', 'reject', 'stop-following', 'undo')".firestoreReadProjectId=bridgy-federated.firestoreDeleteProjectId=bridgy-federated.firestoreHintNumWorkers=100

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2025

Currently getting a breakdown of Objects by type with:

obj = Object(type=None)
while obj:
  print(obj.type)
  count = Object.query(Object.type == obj.type).count()
  print(count)
  obj = Object.query(Object.type > obj.type).get()

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2025

Here's the breakdown, first the big ones:

type count
note 23754560
like 15772637
share 13328784
comment 10173665
follow 3357923
person 1994145
article 1666237

...and all, by type:

type count
None 742482
add 87316
article 1666237
block 45
collection 1118
comment 10173665
event 989
flag 1
follow 3357923
group 10352
image 3
invite 14
issue 6
like 15772637
note 23754560
organization 43
page 43003
person 1994145
post 3
question 29979
rsvp-Yes 1
rsvp-interested 7
rsvp-maybe 7
rsvp-no 5
rsvp-yes 182
service 10681
share 13328784
tag 2350
video 8450

@snarfed
Copy link
Owner Author

snarfed commented Jan 23, 2025

Looked at how many of the big ones are over 90d old, specifically from before Oct 2024, the fractions range widely:

type count
note 35% (8349213)
like 68% (10701209)
share 39% (5161054)
comment 60% (6144165)
follow 92% (3077382)
person 74% (1482462)
article 9% (143225)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant