Skip to content

A clojuresque key-value/document store protocol with core.async.

License

Notifications You must be signed in to change notification settings

homebaseio/konserve

 
 

Repository files navigation

konserve

https://img.shields.io/badge/slack-join_chat-brightgreen.svg https://img.shields.io/clojars/v/io.replikativ/konserve.svg https://circleci.com/gh/replikativ/konserve.svg?style=shield https://img.shields.io/github/last-commit/replikativ/konserve/development.svg https://versions.deps.co/replikativ/konserve/status.svg

/Simple durability, made flexible./

A simple document store protocol defined with synchronous and core.async semantics to allow Clojuresque collection operations on associative key-value stores, both from Clojure and ClojureScript for different backends. Data is generally serialized with edn semantics or, if supported, as native binary blobs and can be accessed similarly to clojure.core functions get-in, assoc-in and update-in. update-in especially allows to run functions atomically and returns old and new value. Each operation is run atomically and must be consistent (in fact ACID), but further consistency is not supported (Riak, CouchDB and many scalable solutions don’t have transactions over keys for that reason). This is meant to be a building block for more sophisticated storage solutions (Datomic also builds on kv-stores). A simple append-log for fast write operations is also implemented.

Features

  • cross-platform between Clojure and ClojureScript
  • lowest-common denominator interface for an associative datastructure with edn semantics
  • thread-safety with atomicity over key operations
  • consistent error handling for core.async
  • fast serialization options (fressian, transit, …), independent of the underlying kv-store
  • very low overhead protocol, including direct binary access for high throughput
  • no additional dependencies and setup required for IndexedDB in the browser and the file backend on the JVM
  • avoids blocking io, the filestore for instance will not block any thread on reading. Fully asynchronous support for writing and other stores is in the pipeline.

Supported Backends

A file-system store in Clojure and IndexedDB for ClojureScript are provided as elementary reference implementations for the two most important platforms. No setup and no additional dependencies are needed.

fs-store

The file-system store currently uses fressian in Clojure and fress in ClojureScript and is quite efficient. Both implementations use the same on-disk format and can load the same store (but not concurrently). It also allows to access values as a normal file-system file, e.g. to open it with a native database like HDF5 in Java. You can decide not to fsync on every write by a configuration of {:sync-blob? false}, if a potential, but unlikely data loss is not critical for you (e.g. for a session store). Note that the database will not be corrupted in this case, you can just lose some write operations before the crash.

IndexedDB

For IndexedDB there is no internal JSON-representation of the underlying store like transit yet, hence it is fairly slow for edn still. There is a JSON store protocol implemented for IndexedDB in case interoperability with a JavaScript application is wanted. Be careful not to confuse values with edn values, they are stored in separate locations and cannot clash.

External Backends

These are partially outdated and not actively maintained by us, but should not be hard to pick up.

New storage backends, e.g. MongoDB, JDBC, WebSQL, Local-Storage are welcome.

There is also a system component for the internal backends.

Projects building on konserve

  • The protocol is used in production and originates as an elementary storage protocol for replikativ.
  • kampbell maps collections of entities to konserve and enforces specs.

Benchmarks

Due to its simplicity it is also fairly fast as it directly serializes Clojure, e.g. with fressian, to durable storage. The file-store is CPU bound atm. More detailed benchmarks are welcome :).

(let [numbers (doall (range 1024))]
  (time
   (doseq [i (range 1000)]
     (<!! (assoc-in store [i] numbers)))))
;; fs-store: ~7.2 secs on my old laptop
;; mem-store: ~0.186 secs

(let [numbers (doall (range (* 1024 1024)))]
  (time
   (doseq [i (range 10)]
     (<!! (assoc-in store [i] numbers)))))
;; fs-store: ~46 secs, large files: 1 million ints each
;; mem-store: ~0.003 secs

It is not necessarily fast depending on the usage pattern. The general idea is to write most values once (e.g. in form of index fragments) and only update one place once all data is written, similar to Clojure’s persistent datastructures and balanced trees. To store values under non-conflicting keys, have a look at hasch.

Combined usage with other writers

konserve assumes currently that it accesses its keyspace in the store exclusively. It uses hasch to support arbitrary edn keys and hence does not normally clash with outside usage even when the same keys are used. To support multiple konserve clients in the store the backend has to support locking and proper transactions on keys internally, which is the case for backends like CouchDB, Redis and Riak.

Serialization formats

Different formats for edn serialization like fressian, transit or a simple pr-str version are supported and can be combined with different stores. Stores have a reasonable default setting. You can also extend the serialization protocol to other formats if you need it. You can provide incognito support for records, if you need them.

Tagged Literals

You can read and write custom records according to incognito.

Usage

Add to your leiningen dependencies: http://clojars.org/io.replikativ/konservehttp://clojars.org/io.replikativ/konserve/latest-version.svg]]

From a Clojure REPL run the following functions for the core.async variants of the code.

(ns test-db
  (:require [konserve.filestore :refer [new-fs-store]]
            [konserve.core :as k]
            [clojure.core.async :refer [<!]]))

(def store (<! (new-fs-store "/tmp/store")))

(<! (k/assoc-in store ["foo" :bar] {:foo "baz"}))
(<! (k/get-in store ["foo"]))
(<! (k/exists? store "foo"))

(<! (k/assoc-in store [:bar] 42))
(<! (k/update-in store [:bar] inc))
(<! (k/get-in store [:bar]))
(<! (k/dissoc store :bar))

(<! (k/append store :error-log {:type :horrible}))
(<! (k/log store :error-log))

(let [ba (byte-array (* 10 1024 1024) (byte 42))]
  (time (<! (k/bassoc store "banana" ba))))

And the following synchronous code if you are not using core.async in your scope:

(ns test-db
  (:require [konserve.filestore :refer [new-fs-store]]
            [konserve.core :as k]))

(def store (new-fs-store "/tmp/store" :opts {:sync? true}))

(k/assoc-in store ["foo" :bar] {:foo "baz"} {:sync? true})
(k/get-in store ["foo"] nil {:sync? true})
(k/exists? store "foo" {:sync? true})

(k/assoc-in store [:bar] 42 {:sync? true})
(k/update-in store [:bar] inc {:sync? true})
(k/get-in store [:bar] nil {:sync? true})
(k/dissoc store :bar {:sync? true})

(k/append store :error-log {:type :horrible} {:sync? true})
(k/log store :error-log {:sync? true})

(let [ba (byte-array (* 10 1024 1024) (byte 42))]
  (time (k/bassoc store "banana" ba {:sync? true})))

(k/bget store "banana"
        (fn [{is :input-stream}]
          (your-read-does-all-work-here is))
        {:sync? true})

In a ClojureScript REPL you can evaluate the expressions from the REPL each wrapped in a go-block.

For simple purposes a memory store wrapping an Atom is implemented as well:

(ns test-db
  (:require [konserve.memory :refer [new-mem-store]]
            [konserve.core :as k]))

(go (def my-db (<! (new-mem-store)))) ;; or (go (def my-db (<!
(new-mem-store (atom {:foo 42}))))) 

In ClojureScript from a browser (you need IndexedDB available in your js env):

(ns test-db
  (:require [konserve.indexeddb :refer [new-indexeddb-store]])
  (:require-macros [cljs.core.async.macros :refer [go go-loop]]))

(go (def my-db (<! (new-indexeddb-store "konserve"))))

(go (println "get:" (<! (k/get-in my-db ["test" :a]))))

(go (doseq [i (range 10)] (<! (k/assoc-in my-db [i] i))))

;; prints 0 to 9 each on a line
(go (doseq [i (range 10)] (println (<! (k/get-in my-db [i])))))

(go (println (<! (k/assoc-in my-db ["test"] {:a 1 :b 4.2}))))

(go (println (<! (k/update-in my-db ["test" :a] inc))))
;; => "test" contains {:a 2 :b 4.2}

For non-REPL code execution you have to put all channel operations in one top-level go-block for them to be synchronized:

(ns test-db
  (:require [konserve.indexeddb :refer [new-indexeddb-store]])
  (:require-macros [cljs.core.async.macros :refer [go go-loop]]))

(go (def my-db (<! (new-indexeddb-store "konserve")))

    (println "get:" (<! (k/get-in my-db ["test" :a])))

    (doseq [i (range 10)]
       (<! (k/assoc-in my-db [i] i))))

For more examples have a look at the comment blocks at the end of the respective namespaces.

Backend implementation guide

We provide a backend implementation guide .

JavaScript bindings

There are experimental javascript bindings in the konserve.js namespace:

goog.require("konserve.js");

konserve.js.new_mem_store(function(s) { store = s; });
# or
konserve.js.new_indexeddb_store("test_store", function(s) { store = s; })

konserve.js.exists(store, ["foo"], function(v) { console.log(v); });
konserve.js.assoc_in(store, ["foo"], 42, function(v) {});
konserve.js.get_in(store,
                   ["foo"],
                   function(v) { console.log(v); });
konserve.js.update_in(store,
                      ["foo"],
                      function(v) { return v+1; },
                      function(res) { console.log("Result:", res); });

Changelog

1.0.0-alpha1

  • implement dual async+sync code expansion
  • generalize filestore logic to ease backend development

0.6.0-alpha1

  • introduce common storage layouts and store serialization context with each key value pair, this will facilitate migration code in the future
  • implementation for the filestore (thanks to @FerdiKuehne)
  • introduce metadata to track edit timestamps
  • add garbage collector
  • introduce superv.async error handling
  • extend API to be more like Clojure’s (thanks to @MrEbbinghaus)
  • add logging
  • update on ClojureScript support still pending

0.5.1

  • fix nested value extraction in filestore, thanks to @csm

0.5

  • cljs fressian support
  • filestore for node.js

0.5-beta3

  • experimental caching support

0.5-beta1

  • improved filestore with separate metadata storage
  • experimental clojure.core.cache support

0.4.12

  • fix exists for binary

0.4.11

  • friendly printing of stores on JVM

0.4.9

  • fix a racecondition in the lock creation
  • do not drain the threadpool for the filestore

0.4.7

  • support distinct dissoc (not implicit key-removal on assoc-in store key nil)

0.4.5

  • bump deps

0.4.4

  • make fsync configurable

0.4.3

  • remove full.async until binding issues are resolved

0.4.2

  • simplify and fix indexeddb
  • do clean locking with syntactic macro sugar

0.4.1

  • fix cljs support

0.4.0

  • store the key in the filestore and allow to iterate stored keys (not binary atm.)
  • implement append functions to have high throughput append-only logs
  • use core.async based locking on top-level API for all stores
  • allow to delete a file-store

0.3.6

  • experimental JavaScript bindings

0.3.4

  • use fixed incognito version

0.3.0 - 0.3.2

  • fix return value of assoc-in

0.3.0-beta3

  • Wrap protocols in proper Clojure functions in the core namespace.
  • Implement assoc-in in terms of update-in
  • Introduce serialiasation protocol with the help of incognito and decouple stores

0.3.0-beta1

  • filestore: disable cache
  • factor out all tagged literal functions to incognito
  • use reader conditionals
  • bump deps

0.2.3

  • filestore: flush output streams, fsync on fs operations
  • filestore can be considered beta quality
  • couchdb: add -exists?
  • couchdb: move to new project
  • remove logging and return ex-info exceptions in go channel

0.2.2

  • filestore: locking around java strings is a bad idea, use proper lock objects
  • filestore: do io inside async/thread (like async’s pipeline) to not block the async threadpool
  • filestore: implement a naive cache (flushes once > 1000 values)
  • filestore, indexeddb: allow to safely custom deserialize file-inputstream in transaction/lock
  • filestore, indexeddb, memstore: implement -exists?

0.2.1

  • filestore: fix fressian collection types for clojure, expose read-handlers/write-handlers
  • filestore: fix -update-in behaviour for nested values
  • filestore: fix rollback renaming order

0.2.0

  • experimental native ACID file-store for Clojure
  • native binary blob support for file-store, IndexedDB and mem-store

Contributors

  • Björn Ebbinghaus
  • Daniel Szmulewicz
  • Konrad Kühne
  • Christian Weilbach

License

Copyright © 2014-2019 Christian Weilbach and contributors

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

A clojuresque key-value/document store protocol with core.async.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Clojure 99.7%
  • Other 0.3%