Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for active replication (for load-balancing) #12

Open
remie opened this issue Jul 24, 2011 · 8 comments
Open

Add support for active replication (for load-balancing) #12

remie opened this issue Jul 24, 2011 · 8 comments
Milestone

Comments

@remie
Copy link
Collaborator

remie commented Jul 24, 2011

The idea behind this is that, when dealing with large volume websites you might want to be able to load-balance your CMS. To do so, you need active replication.

The CDI extension should be smart enough to do this. It should even allow you to have a Master CDI instance for your structural changes (development environment) and a Master replication instance in your cluster. The Master replication instance accepts structural changes from the Master CDI instance and propagates them to it's connected slave instances.

Replication would simply mean that you will have the ability to list multiple MySQL databases to which every SQL statement is executed.

As expected, there is a tricky part: you need to make sure that all DATA is stored in a single master instance, from where it is replicated to the slave instances. This is easy for CMS content that it created by editors: simply only allow them to work on the Master backend. However, if your site has user-generated content, you need to make sure that all data is posted to the master instance. This might be quite a challenge!

@nickdunn
Copy link

This has been kicking around in my mind for some time, but never surfaced because the requirement has yet to arise on a project. I haven't considered fully how this might work, but I think it could be a case of swapping the database connection during the lifecycle of a page.

When operating in the Symphony backend you would want all connections to be made to the master database. When submitting content via an event on the frontend you also want direct interaction with the master. However read operations (data sources) would read from a slave.

Sounds scarily complex to me.

@remie
Copy link
Collaborator Author

remie commented Jul 29, 2011

Replication is always scary :)
I've worked on projects where they used Hippo (Apache JackRabbit) or Umbraco, and it was always the case that there was a point where you had to wipe the entire slave instance and rebuild the datastore from revision 0 (or at least the last base revision);

So in order to implement replication, it is imperative that it is repeatable: you should be able to restore a database, start the replication from the slave instances and reprocess all steps.

The downside of PHP is that it is hard to use replication intervals. AFAIK there is no in-memory scheduler like Quartz which can either push (master) or pull (slave) changes. So it needs to be done at the same time it really happens and the visitor is wainting for it which makes it more scary...

I guess this is what my implementation of replication would look like in Symphony:

Installation

  1. Add backup functionality on the Master instance
  2. Upon activation of Slave, add required restore of base revision (backup of database). Store this base revision to disk.

One-Way replication (Master with read-only Slave)

  1. Take out the Master instance from the load-balanced instances group and only expose this to authors / developers
  2. Prevent the slave instance from executing local queries
  3. Log each an every SQL query on the master to a database table with revision numbers (auto increment field)
  4. Actively push each query to a REST interface on all slave instances
  5. Log the execution of the query on the Slave instance (based on the revision number) to prevent duplicate execution

In case of an error on the Slave, perform automatic disable (maintenance mode / remove from loadbalancer) and notify the administrator. No sticky session required to be configured on the load-balancer.

Two-Way replication (Master to Slave & Slave to Master)

In case you wish to enable User contribution from the front-end (forum, comments) it becomes a bit more tricky:

The best approach would be to enhance your Master instance with an site specific API. All front-end DATA events should then be posted to this API in order for the Master instance to process them. That way you still only have 1 database on which changes occur. The change will be propagated to the Slave instances using One-Way replication. In case of an error you can inform the visitor immediately.

If that is not an option, there is the conceptual implementation of Queued database synchronization:

  1. Take out the Master instance from the load-balanced instances group and only expose this to authors / developers
  2. On the Master instance, a branch copy of the database is created which is updated automatically (each query is executed twice)
  3. On your Slave instance, intercept (and group if possible) query execution and post it to a REST interface on the Master instance. The Slave instance should wait for a response. It is important that the change is not committed to the Slave database.
  4. On the Master instance, the query is placed in a queue waiting for execution.
  5. Queries are executed in order on the branch database. If OK, the query is executed on the Trunk, thus integrating it with normal production data. In case of error, the query is denied.
  6. The Master notifies the Slave that the query has been executed succesfully or not. The Slave instance can update the visitor.
  7. Succesfully integrated queries are propagated to the Slave instances using One-Way replication
  8. SQL queries from the Master back-end are also added to the execution queue.

The challenges with this approach:

  1. Is the Master instance fast enough to process all incoming queries (should be, it wil also process this volume if there is a single instance)?
  2. Is it possible to group query execution and offer a complete changeset to the master?
  3. Is it possible to have either a wait implementation or asynchronous execution on the Slave that allows a proper feedback cycle and does not bring down the instance?
  4. Is it repeatable? Can you take out a Slave instance, wipe it and rebuild the datastore?

Passive replication

  1. Add an option on the slave to purge the database, restore the base revision, and pull changes from REST interface on master

Just some thoughts on a random friday :)

@remie remie closed this as completed Jul 29, 2011
@remie remie reopened this Jul 29, 2011
@remie
Copy link
Collaborator Author

remie commented Jul 29, 2011

Oops... wrong button :)

@nickdunn
Copy link

/me sobs.

@remie
Copy link
Collaborator Author

remie commented Jul 29, 2011

Because you hopped that I already solved it?

@nickdunn
Copy link

Heh, not so much, more because I hadn't considered it in this much detail before. It does sound rather like too much effort for very little gain. What is the probability of you using Symphony in an environment like this? Fingers crossed we've never had the need for master/slave databases, even on some pretty hefty operations.

@remie
Copy link
Collaborator Author

remie commented Jul 29, 2011

Let's hope it is never becomes a requirement. That's why I've put it on the 2.0.0 release without due date :)
Nevertheless, I do think it it could be a selling point for enterprise level customers. I've worked with clients that did not even want to consider a CMS without replication.

But the main point of working on this feature is because it is interesting to do :D

@nickdunn
Copy link

the main point of working on this feature is because it is interesting to do

Isn't that the reason to put any feature in? :-)

Thank you for your thoughts on this. I still haven't given CDI a good once-over, but it's creeping up my todo list. Great work so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants