Skip to content

System Overview

Luke Lovett edited this page Apr 3, 2014 · 2 revisions

This page explains some of the mongo-connector internals from the perspective of running mongo-connector for the first time.

  1. The main Connector thread determines the type of MongoDB node provided in the address given to it by issuing a isdbgrid command to the node. The response reveals whether the node is a mongod or a mongos, and thus whether the user intends to replicate from a replica set or sharded cluster, respectively. Based on this information, the Connector thread then creates an OplogThread for the primary in the replica set, or for the primary node in each shard within the sharded cluster administered by the mongos as necessary. It also initializes one or more DocManagers for each replication endpoint and provides these to the OplogThreads.

  2. An OplogThread creates a tailable cursor into the oplog.rs collection of the mongod. This collection is a running record of all operations that happen on that node.

  3. The OplogThread initiates a "collection dump," by which it upserts every document in the namespaces we're interested in through the specified DocManagers. These DocManagers pass on the documents to their respective target systems. The "collection dump" happens only the first time mongo-connector is started, and it does not happen again as long as mongo-connector can find the timestamp of the last oplog record it processed (more on this in the next step).

  4. The OplogThread goes into a loop, efficiently polling the oplog for new documents. Each document corresponds to one operation and contains information such as time of the operation, namespace of the operation, what operation was performed, and which documents were affected. Based on the operation provided in the oplog document, the OplogThread calls the appropriate method of each DocManager. If the operation is an 'upsert', then the OplogThread retrieves the inserted or updated document from MongoDB, annotates it with the timestamp and namespace from the corresponding oplog document (in the _ts and ns fields, respectively), and passes the document along to the DocManager. Lastly, the OplogThread notes the timestamp from the oplog document it read and saves this information as its "checkpoint". The checkpoint acts like a bookmark, periodically written out to the oplog progress file ('config.txt' by default), and can be used to fast-forward to the proper place in the oplog if mongo-connector is shut-down.

The last step runs until mongo-connector is killed. Some events, such as temporarily losing a connection to MongoDB, replica set rollback, or falling very far behind in the oplog, can cause the OplogThread to take other actions. These actions will be covered in another page.

Clone this wiki locally