Skip to content

Resyncing the Connector

Luke Lovett edited this page Jun 9, 2014 · 3 revisions

This page describes when and how to re-sync mongo-connector. The most common reason to need to re-sync mongo-connector is that it couldn't replicate operations from the oplog fast enough. This can happen when there is a lot of write activity happening in MongoDB, such as when using mongoimport. Because the oplog is a capped collection, older records are overwritten when the collection is full.

Avoiding Oplog Rollover

Mongo-connector can be more tolerant to short bursts of high write activity by increasing the oplog size in MongoDB. The greater oplog time allows mongo-connector to "catch up" when there is less write activity.

How to Perform a Re-Sync

The only way to ensure that the data in your external system is consistent with what is in MongoDB is to delete and re-index all documents in the target. When MongoDB is the target system, mongo-connector also maintains a separate database to store replication metadata called __mongo_connector. This database should be dropped as well. After all data is removed, you may delete the oplog progress file (usually called "config.txt") and re-start mongo-connector. Mongo-connector will then perform a collection dump, re-indexing all your data.

Alternatives

There aren't any other methods to restore a consistent state with the source MongoDB replica set or cluster. However, you can get mongo-connector simply running again by deleting the oplog progress file and restarting mongo-connector. This causes mongo-connector to perform a collection dump, re-saving the latest versions of all documents, then start tailing the oplog. This does not bring your target to a consistent state but may be suitable for pure insert/update use cases. If any delete operations were clobbered by the oplog collection rollover, mongo-connector cannot catch them without a proper re-sync (described above).

Clone this wiki locally