feat: improve writer latency #618

whilo · 2023-03-20T06:19:40Z

Fixes #617. This pull request changes the operations of the write process to flush and sync the dirty indices in a two stage process instead of waiting on all operations during the execution:

Flush all index trees and collect asynchronous operations on dirty nodes without waiting. Then wait on collected operations until all index data is written. (This ensures that no pointers used by the DB record in 2. will be dangling.)
Write DB record in parallel into commit log and into branch value.

This approach reduces the transaction latency in the best case to two round trips to the underlying store, which is optimal if distributed snapshot consistency needs to be preserved. Otherwise other processes could read DB records that point to tree fragments that are not yet written.

…eleased.

Reduce blocking => speedup. Rename get-time to get-date.

factor

persistent_set files

whilo · 2023-11-25T19:08:35Z

src/datahike/writer.cljc

-             (put! callback res)))
-         (recur))
-       (log/debug "Writer thread gracefully closed")))))
+  [connection write-fn-map transaction-queue-size commit-queue-size commit-wait-time]


This is the main added logic. It first transacts in a loop and then commits in the second loop. Transactions are grouped in commits and they all see the same db-after containing their transaction and their individual datoms added and retracted.

whilo · 2023-11-25T19:11:35Z

src/datahike/index/persistent_set.cljc

  IStorage
  (store [_ node]
    (swap! stats update :writes inc)
    (let [address (gen-address node (:crypto-hash? config))
          _ (trace "writing storage: " address " crypto: " (:crypto-hash? config))]
-      (k/assoc store address node {:sync? true})
+      (swap! pending-writes conj (k/assoc store address node {:sync? false}))


Write operations are tracked here and collected by commit! later.

whilo · 2023-11-25T19:12:07Z

src/datahike/index/persistent_set.cljc

-  (psset/set-branching-factor! BRANCHING_FACTOR)
-  (let [^PersistentSortedSet pset (psset/sorted-set-by (index-type->cmp-quick index-type false))]
-    (set! (.-_storage pset) (:storage store))
+  (let [^PersistentSortedSet pset (psset/sorted-set* {:cmp (index-type->cmp-quick index-type false)


This is for the new persistent sorted set version.

whilo · 2023-11-25T19:12:50Z

src/datahike/writing.cljc

+  (let [{:keys [hash max-tx max-eid meta]} db]
+    (uuid [hash max-tx max-eid meta])))
+
+(defn commit!


All durable changes happen here.

This takes the time of two roundtrips to the store, one for all writes in flush-pending-writes (where the slowest write operation dominates) and one for both commit-log-op and branch-op to update the root of the indices.

whilo · 2023-11-25T19:13:12Z

src/datahike/writing.cljc

+                           (assoc-in [:meta :datahike/parents] parents)
+                           (assoc-in [:meta :datahike/commit-id] commit-id)))))
+
+(defn update-connection! [connection tx-data tx-meta update-fn]


Transactions (transact and load-entities) happen here.

whilo · 2023-12-01T08:19:48Z

@TimoKramer This PR is finally ready now. The necessary flush statement of the last commit was missing and caused errors on machines with slow filesystems which then made the async IO hang. All tests pass now always on all machines I have access to. The problem was not clearly visible because assertion errors are not propagated and koache swallows all log output by default. The following konserve PR renders such read errors visible and does not use assertions anymore replikativ/konserve#115.

TimoKramer

so far looks good to me. one thing I do not understand is, why you're using two loops in the writer...

TimoKramer · 2023-12-03T13:55:56Z

src/datahike/index/persistent_set.cljc

@@ -166,6 +166,7 @@
    (if-let [cached (wrapped/lookup cache address)]
      cached
      (let [node (k/get store address nil {:sync? true})]
+        (assert (not (nil? node)) "Node not found in storage.")


You really want to throw an Error here?

This should never happen. All the nodes we refer to in our addresses/pointers must always exist, otherwise the store got corrupted or a write operation was inconsistent (e.g. the underlying store did not ensure its durability or mixed up the causal order of events [I think Redis, does, but it does not immediately store durably, so something like this could go wrong in a faulty store backend]). If your comment is about the assertion, yes I will change that to an Exception now.

But we also could allow it and retry. In this case you would be able to sync the store from another system out of order (e.g. through datsync, rsync or file copying or whatever) and your operations would just stall instead of break. But I think it is better to just copy everything except for the roots first and overwrite the roots last in an atomic operation (in the same order commit writes), that way you can sync without interrupting operations at all.

src/datahike/writer.cljc

TimoKramer · 2023-12-03T14:34:05Z

src/datahike/writer.cljc

+                  (recur (poll! commit-queue))))
+              (log/trace "Batched transaction count: " (count @txs))
+              ;; commit latest tx to disk
+              (let [db (:db-after (first (last @txs)))]


TimoKramer

I think it's good.

TimoKramer · 2023-12-04T16:45:39Z

src/datahike/writing.cljc

-                                       :parent p}))
-                          commit-id))))))
+                (do
+                  (assert (not (nil? p)) "Parent cannot be nil.")


one more assert here. FYI

thanks, fixed.

TimoKramer · 2023-12-04T16:51:07Z

src/datahike/writing.cljc

+                  (<?- commit-log-op)
+                  (<?- branch-op)


is this for catching exceptions?

This is waiting for the write operations in the asynchronous case and a no-op in the synchronous case (because the values are already written then). It also will re-throw exceptions that would be in either of these channels.

whilo added 2 commits March 24, 2023 21:38

Release all test connections. Fix create-database without arguments.

d9ef34e

Remove randomization from tests.

e41b843

whilo force-pushed the 617-improve-writer-latency branch from 0fc6e49 to 33143bd Compare March 25, 2023 11:05

whilo added 9 commits March 25, 2023 13:27

Release connections locally on deletions, but warn if they were not r…

54ac472

…eleased.

Add connections file...

31927e7

Fix create-database and connection spec.

b1baacb

First take on parallel operations for writer.

319345c

Fix format.

52b0b9b

Implement batching transactor.

3efce5b

Expose buffer sizes and warn on back pressure.

692e0c8

Reduce blocking => speedup. Rename get-time to get-date.

Provide synchronous flushing of pending writes.

0e3ad93

Fix outstanding tests.

49caee4

whilo force-pushed the 617-improve-writer-latency branch from 0294ff3 to 49caee4 Compare March 28, 2023 08:37

Merge main.

80d574b

whilo force-pushed the 617-improve-writer-latency branch from 4e895c6 to 80d574b Compare August 8, 2023 20:36

whilo added 15 commits August 8, 2023 13:37

Merge branch 'main' into 617-improve-writer-latency

c363ae9

Merge branch 'main' into 617-improve-writer-latency

8ea4d54

Update Datahike API and writer

ea4c165

Update persistent-sorted-set version and branching

b69466c

factor

Fix formatting and whitespace in versioning and

5cebfaf

persistent_set files

Add Leaf type hint for native compilation.

8edbb24

Fix indentation for persistent set

fc89bd3

Add boolean argument to update-and-commit function

0e58828

Deref transact! function call in server.clj

2c00059

Fix transact! function to handle updates correctly

bcc1201

Add Leaf type hint for native compilation

b96bc89

Minor cleanups.

f3cb0aa

Fix commit-db variable assignment

4547590

Fix callback bug in commit function

73cf977

Refactor connection handling and fix writer test

d616ce8

whilo added 11 commits November 22, 2023 00:10

Use new konserve version.

77886ed

Update git/sha for io.replikativ/konserve

75bcce2

Update git/sha for io.replikativ/konserve

bfb2e97

Add simple stress test for file store

f347d69

Fix format

c80cf14

Update git/sha for io.replikativ/konserve

14b2cd2

Don't run spec tests on stress-test

2f0fb13

Fix JSON handlers in writer. Small fixes.

f667a31

Test with old konserve version

05b91b7

Use konserve release and remove commented code

a3167ff

Update datahike logo

4458d5e

whilo marked this pull request as ready for review November 25, 2023 19:05

whilo commented Nov 25, 2023

View reviewed changes

whilo requested review from TimoKramer and jsmassa November 25, 2023 19:19

Add missing flush statement on db creation.

559f7b8

TimoKramer reviewed Dec 3, 2023

View reviewed changes

whilo added 2 commits December 3, 2023 13:25

Use Exception for node not found. last -> peek.

c2e4d5e

Fix condition for node exception.

1faaefb

TimoKramer previously approved these changes Dec 4, 2023

View reviewed changes

Use dt/raise instead of throw.

d21e65a

whilo dismissed TimoKramer’s stale review via d21e65a December 4, 2023 19:42

TimoKramer approved these changes Dec 4, 2023

View reviewed changes

whilo merged commit 4b3a87d into main Dec 4, 2023
10 checks passed

whilo deleted the 617-improve-writer-latency branch February 7, 2024 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve writer latency #618

feat: improve writer latency #618

whilo commented Mar 20, 2023 •

edited

Loading

whilo Nov 25, 2023

whilo Nov 25, 2023

whilo Nov 25, 2023

whilo Nov 25, 2023

whilo Dec 3, 2023

whilo Nov 25, 2023

whilo commented Dec 1, 2023

TimoKramer left a comment

TimoKramer Dec 3, 2023

whilo Dec 3, 2023

TimoKramer Dec 3, 2023

whilo Dec 3, 2023

TimoKramer left a comment

TimoKramer Dec 4, 2023

whilo Dec 4, 2023

TimoKramer Dec 4, 2023

whilo Dec 4, 2023

feat: improve writer latency #618

feat: improve writer latency #618

Conversation

whilo commented Mar 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whilo commented Dec 1, 2023

TimoKramer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimoKramer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whilo commented Mar 20, 2023 •

edited

Loading