Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: 🐛 Clean removal of output files #6986

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

giancarloromeo
Copy link
Contributor

@giancarloromeo giancarloromeo commented Dec 20, 2024

What do these changes do?

TL;DR: Removing (output) files referenced by Project Nodes, leaves the system in an inconsistent state.

Currently, the WebServer is the 1st responder and forwards the deletion requests to the Storage service, which deletes physically the file(s).
The Projects Nodes are not updated accordingly and Director v2 is not informed about the changes.


Proposal

To maintain decoupling while ensuring consistency between the storage and project services, I recommend using an Event-Driven Architecture:

  • Event Broker (e.g., RabbitMQ):
    • Introduce an event broker as a central communication channel.
    • When a file is deleted from the storage service, it publishes a FileDeleted event with relevant metadata.
  • Storage Service:
    • Handles the physical deletion of files in S3.
    • Keeps a local counter for each file for performance and basic deletion decisions.
    • Publishes the FileDeleted event once the deletion is successful.
    • Subscribes to FileReferenced / FileDereferenced events from the event broker.
    • When a FileReferenced / FileDereferenced event is received, it updates the soft reference count associated with that file.
  • Project Service:
    • Publishes FileReferenced / FileDereferenced events once a file is referenced/dereferenced.
    • Subscribes to FileDeleted events from the event broker.
    • When a FileDeleted event is received, it verifies if the file is referenced in any project and updates its data (e.g., mark the file as deleted, remove reference from Nodes).

Event Types Overview

Event Types Producer(s) Consumer(s)
FileDeleted Storage service Projects service, ...
FileReferenced Projects service, ... Storage service
FileDereferenced Projects service, ... Storage service

It would be nice to have the PR #7010 in as well, since it cleans the Project Nodes persistence in the DB.

Related issue/s

How to test

Dev-ops checklist

@giancarloromeo giancarloromeo self-assigned this Dec 20, 2024
@giancarloromeo giancarloromeo added a:webserver issue related to the webserver service a:storage issue related to storage service labels Dec 20, 2024
@giancarloromeo giancarloromeo added this to the Event Horizon milestone Dec 20, 2024
@giancarloromeo giancarloromeo changed the title 🐛 Clean output files removal WIP: 🐛 Clean output files removal Dec 20, 2024
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 60.97561% with 16 lines in your changes missing coverage. Please review.

Project coverage is 83.95%. Comparing base (fb86798) to head (a229d0a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6986      +/-   ##
==========================================
- Coverage   87.16%   83.95%   -3.21%     
==========================================
  Files        1634     1629       -5     
  Lines       64264    64149     -115     
  Branches     2051     2051              
==========================================
- Hits        56013    53858    -2155     
- Misses       7914     9968    +2054     
+ Partials      337      323      -14     
Flag Coverage Δ
integrationtests 60.55% <40.00%> (-3.12%) ⬇️
unittests 83.21% <60.97%> (-2.36%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library 93.49% <ø> (ø)
pkg_dask_task_models_library 97.09% <ø> (ø)
pkg_models_library 91.42% <80.00%> (-0.02%) ⬇️
pkg_notifications_library 84.57% <ø> (ø)
pkg_postgres_database 88.41% <ø> (ø)
pkg_service_integration 70.18% <ø> (ø)
pkg_service_library 74.15% <ø> (+0.01%) ⬆️
pkg_settings_library 90.49% <ø> (ø)
pkg_simcore_sdk 65.64% <ø> (-19.75%) ⬇️
agent 96.45% <ø> (ø)
api_server 90.55% <ø> (ø)
autoscaling 96.10% <ø> (ø)
catalog 90.32% <ø> (ø)
clusters_keeper 99.24% <ø> (ø)
dask_sidecar 91.26% <ø> (ø)
datcore_adapter 93.18% <ø> (ø)
director 76.92% <ø> (ø)
director_v2 91.28% <ø> (-0.02%) ⬇️
dynamic_scheduler 97.21% <ø> (ø)
dynamic_sidecar 88.81% <ø> (-0.95%) ⬇️
efs_guardian 90.39% <ø> (ø)
invitations 93.42% <ø> (ø)
osparc_gateway_server ∅ <ø> (∅)
payments 92.66% <ø> (ø)
resource_usage_tracker 89.28% <ø> (+0.11%) ⬆️
storage 53.70% <61.29%> (-35.87%) ⬇️
webclient ∅ <ø> (∅)
webserver 79.18% <40.00%> (-5.39%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb86798...a229d0a. Read the comment docs.

@giancarloromeo giancarloromeo changed the title WIP: 🐛 Clean output files removal WIP: 🐛 Clean removal of output files Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:storage issue related to storage service a:webserver issue related to the webserver service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Removing a linked output file in a service, makes the next service unrunnable without clear reason
1 participant