Skip to content

Channel Logs

Jens Alfke edited this page Jan 19, 2015 · 1 revision

THIS DOCUMENT IS OBSOLETE -- THIS ARCHITECTURE WAS REPLACED IN SPRING 2014 BY A NEWER ONE (IN-MEMORY CHANNEL CACHES.)

A channel-log is a (Couchbase) document that stores the recent history of a channel. It contains, conceptually, the recent _changes feeed entries for that channel: a list of {docid, revid, sequence} tuples. Its timeline needs to extend at least back to the sequence "checkpoint" (the point before which all sequences have been persisted to disk and show up in views), and it will probably go back farther as an optimization.

A changes feed for a channel is generated primarily from the channel-log document. If older entries are needed -- especially in the case where a new client needs to start at sequence 1 -- the gateway runs a view query on the "changes" view as it does today, and merges its output with the channel-log (because the channel-log may contain revisions that haven't been persisted yet.)

The channel-log document is updated by the same gateway process that added the associated revision, and it is authoritative for the recent past (changes more recent than the checkpoint sequence). However the view query is authoritative for any changes older than the checkpoint sequence.

Structure Of A Log

The current definition of a channel-log in Go (from channels/change_log.go) is:

type LogEntry struct {
	Sequence uint64 `json:"seq"`           // Sequence number
	DocID    string `json:"doc,omitempty"` // Empty if this entry has been replaced
	RevID    string `json:"rev,omitempty"` // Empty if this entry has been replaced
	Deleted  bool   `json:"del,omitempty"` // True for a deletion tombstone revision
	Removed  bool   `json:"rmv,omitempty"` // True for a channel-removal tombstone revision
	Hidden   bool   `json:"hid,omitempty"` // True for a losing rev of a conflict
}

type ChangeLog struct {
	Since   uint64     // Sequence this is valid after
	Entries []LogEntry // Entries in order they were added (not sequence order!)
}

Updating A Log

When a gateway node updates a document, it iterates over all the channels that document is in, and appends a new entry to each channel-log. (Actually it's not purely an append, since old entries may have to be be expired from the beginning. Updates of course need to use CAS.)

The rows in the channel-log are ordered by the order they were appended by sync gateway processes, not by sequence number (it could be possible that these orders will not coincide, or in the case of chunky sequences ("epochs") that sequence numbers won't be unique.) This is important because we might serve sequence 46 to a client, and then have sequence 45 appended to the channel-log. So we need to render the provisional sequences in append order, not sequence order.

In the case where a document has multiple changes in quick succession, we can't remove the parent revisions' log entries, because we may need to find their sequence numbers later. But we can remove their document and revision IDs, which take up most of the room.

As a log grows, the oldest (first) entries will be removed to limit its length. This causes its Since property to change: it's updated to the sequence number of the last entry removed. That way the changes feed can tell whether the log is authoritative for the desired sequence range, or whether it needs to backfill it by also querying the changes view.

Sequence IDs

The sequences appearing in the public API of the _changes feed are no longer the same as the internal sequence numbers. Instead, the sequence IDs are strings encoding a mapping from channel name to sequence number (the Go type is channels.TimedSet.) An example sequence ID string:

abc:16,cnn:8,pbs:20

This encodes that the latest change seen on the abc channel is sequence #16, the latest on the cnn channel is #8, and the latest on pbs is #20 (with no changes sent on any other channels.)

These sequence IDs are sent by the client to the changes feed (in the ?since= parameter) and returned to the client in the last_seq JSON property. As always, they are opaque to the client.

Generating The Changes Feed

Dividing

The input since change-ID is parsed into a TimedSet of channel-to-sequence-number mappings. For each channel requested, its last-seen sequence number is looked up from this map, and the feed for that channel is generated (in parallel, using a goroutine.)

Using A Channel Log

To generate a single channel's feed, we fetch the channel-log document and look up the since sequence in it. If found, all the following entries go into the feed.

Each channel log has its own Since property that specifies the sequence after which it's authoritative. As a special case, if the sequence we're looking for is not found in the log but it's greater than the log's Since property, we treat that sequence as being before the first entry in the log, i.e. we return the entire log. This prevents unnecessary view queries (q.v.) for sparse channels: if we know a channel has no revisions with sequences before 100, then we shouldn't query a view just because we got asked to start at sequence 1.

Using A View

Otherwise, if if the since value is not greater than the log's Since property (or the log is missing), we need to query the changes view starting from that sequence. We write results from that view (in increasing sequence number) to the feed until they pass the Since value of the log; after that we switch over to the log, since it might have newer revisions that aren't in the view yet.

If the log was missing, we use the latest entries from the view to create a new one and save it.

Merging

The feeds from the individual channels are then merged together into one by repeatedly taking the lowest-sequence-numbered available revision (merging the channel info if the same revision appears on multiple channels) and writing it to the output.

Watching A Log

The longpoll and continuous modes of the _changes feed may require that the handler wait for changes to happen.

Gateway nodes use the TAP feed to observe changes in channel logs. When a handler is waiting for new changes, and detects that one of the relevant channels' log document is updated, it generates the feed again and reads and sends the new entries.

Clone this wiki locally