Replies: 5 comments 13 replies
-
Bonus: we should use that opportunity to first mirror the media bus from Azure to S3 and GCS and then migrate it entirely. There's a good chance that we can just |
Beta Was this translation helpful? Give feedback.
-
I'm for the Dual-Write approach, but with the helix-admin to be executed twice, once on AWS and one on GCloud. i.e. we don't need to store the credentials for 'the other' storage as secrets. otherwise, we don't have a proper separated stack if one cloud is down during an outage, we run afterwards a sync job to catch up what was missed. In order to execute the admin twice, the admin will invoke itself on the other cloud with the same payload. FWIW, I think we eventually change the architecture to be task based, i.e. any invocation to admin just creates the tasks in some queues that get processed. for example a
with the dual-write systen, the admin could also write the task to the queue in the other cloud. |
Beta Was this translation helpful? Give feedback.
-
Fwiw I'd prefer the Dual Write or Delayed Replication over Primary/Replica. Have you already looked into the cost (COGS) of these options (compared to the single-cloud storage architecture we have today)? |
Beta Was this translation helpful? Give feedback.
-
I'm treating the overwhelming show of thumbs here #207 (reply in thread) as agreement to proceed with the simple plan and complexify later. |
Beta Was this translation helpful? Give feedback.
-
I intuitively also gravitate towards the dual write approach |
Beta Was this translation helpful? Give feedback.
-
Having the automated DNS-based multi-CDN load-balancing/fallback somewhat nailed (see show & tell), I'd like to get some feedback on the storage architecture.
The Goal
Nice to have
Approaches
Primary/Replica Architecture
(see here for terminology.
S3 would be the primary storage, GCS, the replica.
helix-admin
updates S3, as beforeDiscussion
+
simple-
in case of an AWS outage, publishing is blockedDual-Write Architecture
helix-admin
updates S3 and GCSDiscussion
+
high availability-
helix-admin
needs to talk to two clouds-
increased complexity (I'd need @dominique-pfister's queueing expertise)Delayed Replication
helix-admin
updates whatever cloud it runs in (GCF->GCS, Lambda->S3)Discussion
+
high availability+
helix-admin
can stay single-cloud+
can evolve from each of the previous two approaches-
combines the complexity of the previous two approachesDual-Write with Self-Invocation
(suggested by @tripod here: #207 (comment))
helix-admin
updates the own cloud storagehelix-admin
then self-invokes on the other cloudDiscussion
+
simple, no queues+
no cross-cloud contamination of secrets/access keys+
high availability-
clouds will go out of sync over time: H3 Multi-Cloud Storage Architecture #207 (reply in thread)Summary
I'd start with the simplest approach (Primary/Replica) and evolve it into the Delayed Replication when it becomes expedient, e.g. after a prolonged AWS downtime during which one of our customers still had a pressing need to publish.
Beta Was this translation helpful? Give feedback.
All reactions