Couchit is a database iterator with tools to validate and manage documents in a CouchDB database.
Couchit runs a set of user-defined JavaScript functions against all documents in a CouchDB database or view, or only some of them by specifying a start and/or an end key(s). Couchit comes with a built-in set of Utility Functions to perform helpful operations such as generating hashes, performing json-schema validation, and storing objects from iteration for post-processing.
Couchit can run as a stand-alone application. Install it via npm:
npm install -g couchit
Then, to run:
couchit ./config.js
Couchit can also run as a module in another program. First, add it to your projects package.json
npm install --save couchit
Then, include it in your program:
const Couchit = require('couchit');
const config = require('./config.js');
new Couchit().iterate(config, console.log);
Couchit can be controlled by environment variables or a config file.
The preferred way is through environment variables. This allows Couchit to run without storing sensititve information in code. It will be useful when setting Couchit to run as an AWS Lambda, for example.
The following list all configuration settings. If no environment variables are set, the Default Value will be used.
Setting | Description | Default Value |
---|---|---|
COUCHDB_ENDPOINT | URI and port of CouchDB server, (does not include http://) | 'localhost:5984' |
COUCHDB_DATABASE | CouchDB database | 'db' |
COUCHDB_USERNAME | CouchDB user | 'couchdb' |
COUCHDB_PASSWORD | CouchDB password | 'couchdb' |
OPTS_INTERVAL | Number of ms to wait between page requests | 100 |
OPTS_START_KEY | View start key | null |
OPTS_END_KEY | View end key | null |
OPTS_PAGE_SIZE | Number of documents to retrieve per batch | 1000 |
OPTS_NUM_PAGES | Number of pages to retrieve | undefined |
OPTS_BATCH_SIZE | Batch update size | 100 |
OPTS_QUIET | Suppress report at end of run | false |
OPTS_TASKS | JavaScript functions with tasks to run | { "count-docs": (util, doc) => { util.count('total-docs') } } |
You can use a combination of environment variables and config.js
Default Values to run Couchit. Just remember that environment variables always override defaults.
$env:COUCHDB_PASSWORD="couchdb"
export COUCHDB_PASSWORD="couchdb"
Tasks are how you validate and manage documents as they are iterated over. There are a number of built-in Utility Functions that can be used by calling their util
method, for example, to get a document hash: util.hash(doc)
. Additional document functionality is provided via util.nano
, which exposes nano document functions. See the dependent-updates
task below for an example of how nano document functions can be used.
On each document iteration, the following functions are available via the Util()
object:
Function | Description |
---|---|
audit |
Add an object to the audit array, which is returned in the callback |
count |
Increment a counter associated with a particular key |
dereference |
Reference external json-schema file definitions from other files |
log |
Alias for console.log |
hash |
Generate a SHA256 hash for a given document, object, or string |
nano |
Exposes nano document functions |
remove |
Delete the document from the database |
save |
Save the document back to the database |
validate |
Validate a document using json-schema |
incrementWaits |
Set a wait. Used to ensure asyncronous process can complete for callback response |
decrementWaits |
Remove a wait |
You can define any number of named tasks to run, which will be run once per document. The following are examples of common tasks that may be useful to adopt for your needs.
This is a trivial example where a counter (total-docs
) is incremented for each document retrieved.
{
"count-docs": (util, doc) => util.count('total-docs'),
}
By using util.audit(), you can store any object for use after processing has completed. This is useful for initiating a post-processing step that is based on the output of Couchit run.
{
"audit-bad-docs": (util, doc) => {
if (doc.status && doc.status === 'bad') {
const object = { bad_doc_id: doc._id, status: doc.status }
util.audit(object);
}
}
}
By using util.audit(), you can store any object for use after processing has completed. This is useful for initiating a post-processing step that is based on the output of Couchit run.
{
"hash-doc-contents": (util, doc) => {
const hash = util.hash(doc);
util.log('SHA256 hash of doc: ' + hash);
}
}
Determine if a document is valid based on a json-schema specification. This task uses ajv for validation.
{
"validate-schema": (util, doc) => {
const schema = require('./test/schema.json');
const data = require('./test/data.json');
const valid = util.validate(schema, data);
util.log(doc._id + ' is valid? ' + valid);
}
}
Note that nano functions run asyncrounously while documents are interated over. To ensure complete stats and (optional) audit object tracking, use the util.incrementWaits()
to set a wait prior calling the async function and util.decrementWaits()
to remove the wait upon completion.
{
"dependent-updates": (util, doc) => {
if (doc.childKeys) {
const keys = doc.childKeys;
// Set a wait
util.incrementWaits();
// Do a bulk get for all childKeys
util.nano.fetch(keys, (err, result) => {
if (err) {
console.log(err);
} else {
const newChildKeys = [];
const rows = result.rows;
const hasMissingChildKeys = false;
// Only add docs to the newChildKeys that were found in the db
rows.forEach(row => {
if (row.doc) {
newChildKeys.push(row.id);
} else {
hasOrphans = true;
util.log('Missing child doc: ' + row.id);
}
});
if (hasMissingChildKeys) {
doc.childKeys = newChildKeys;
util.nano.insert(doc, (err, result) => {
console.log(result);
// Remove a wait
util.decrementWaits();
});
}
}
});
}
}
}
- Couchit is based on Couchtato; thanks to Cliffano Subagio for his work.