SSB over Git aims at simplifying the exploration of potential applications that may benefit from a replication strategy based on directed-acyclic graphs in order to be resilient to forked logs in the exceptional cases, while keeping as many of the benefits of append-only logs in the correct and common case. Hopefully, that will help inform future developments of SSB.
This project is built by analogy: we map the core concepts of SSB and application-building conventions onto Git. Since Git is more general, this is done by using the core plumbing primitives of Git and constraining what operations are possible to obtain similar properties as those obtained with append-only logs in SSB.
Tested with:
If you succeed in replicating experiments with older versions, make a Pull-Request with the version you tested.
We use the terminology introduced here to map the different SSB concepts to Git primitives. Details on the SSB protocol are summarized in this guide. The git basic concepts are covered in the so-called "plumbing" documentation.
We present our approach in 3 parts:
- Vanilla SSB: Replication of SSB's behaviour in Git
- Forking Problem: We show how forks break eventual consistency
- Tolerating Forks: We show how to replicate then detect forks
We first replicate the basic functionalities of SSB:
Create an ed25519 identity with GPG:
$ gpg --full-generate-key
Select ECC (sign and encrypt)
, then Curve 25519
. Other options depend on your personal circumstances.
Note the public key in the output. Example:
pub ed25519 2020-07-18 [SC] [expires: 2021-07-18]
CE44B3CFC4C68B868A7AE868D87953FAD4BB4EC4
The public key in the example is CE44..4EC4
.
To make the following steps easier to follow, let's create 3 identities for fictive participants: Bob, Alice and Eve. For each, redo the steps above and assign their public key to a shell variable. For example, in the case of Bob using the previous public key:
$ export BOB=CE44B3CFC4C68B868A7AE868D87953FAD4BB4EC4;
In your case, replace the public key in with those generated locally by gpg.
A store is the local database in which we will store the logs. Using Git this is a regular repository. Create one for Bob:
$ mkdir bob-store
$ cd bob-store
$ git init .
A SSB message is similar to a signed Git commit in that it has the following properties:
previous
: reference to a previous message (respectively commit)author
: public key of the author (respectively author name, email, public key)sequence
: sequence number of the message (respectively no sequence number because Git supports more than logs)timestamp
: date-time at which the author claims to have created the message (idem)content
: string with maximum size of 8,192 Bytes (respectively commit message with upper bound implementation-specific)signature
: ed25519 signature of 1-5 (respectively 1-2 + 4-5)
Similar to a Git commit, a SSB message is identitied by the SHA hash of its content, but uses SHA-256 instead of SHA-1. Git is transitioning to use SHA-256 as well. For presentation simplicity, in this document we simply just use the default SHA-1 of Git.
Git commits require a valid tree reference so we use an empty tree for all SSB messages. To obtain the reference to the empty tree, we simply write the tree corresponding to the staging index (which should currently be empty):
$ git write-tree
4b825dc642cb6eb9a060e54bf8d69288fbee4904
We can later refer to empty tree by the first hex characters 4b825dc
.
We create a signed message as a (detached) commit:
$ FIRSTCOMMIT=$(echo 'First message' | GIT_COMMITTER_NAME=$BOB GIT_AUTHOR_NAME=$BOB git commit-tree 4b825dc --gpg-sign=$BOB)
The message content could also have been a valid JSON object.
We now assign our first commit as the initial message of a log, using a Git branch reference. We make that reference self-certifying by using the public key of the author as value:
$ git update-ref refs/heads/$BOB $FIRSTCOMMIT
It is now listed as a branch:
$ git branch
Git should display Bob's public key ($BOB
).
To make things more convenient we also create a symbolic link to be able to use a shorter and more memorable name:
$ git symbolic-ref refs/heads/bob-frontier refs/heads/$BOB
We can also verify that Bob's log is consistent with their public key:
$ git log --show-signature bob-frontier
We can now append a second message to Bob's log:
$ SECONDCOMMIT=$(echo 'Second message' | GIT_COMMITTER_NAME=$BOB GIT_AUTHOR_NAME=$BOB git commit-tree 4b825dc --gpg-sign=$BOB -p bob-frontier)
$ git update-ref refs/heads/$BOB $SECONDCOMMIT
Because bob-frontier
is a symbolic reference, it now transitively points to $SECONDCOMMIT
.
We can list the Bob's messages from newest to oldest:
$ git log --topo-order bob-frontier
We can list the same messages from oldest to newest (as is more common in SSB):
$ git log --topo-order --reverse bob-frontier
We can also extract only the log's content:
$ git log --topo-order --reverse --format=%s bob-frontier
First, create a replica for Alice:
$ cd ../
$ mkdir alice-store
$ cd alice-store
$ git init .
$ git write-tree
Pull the latest updates of Bob from Bob's store:
$ git pull ../bob-store refs/heads/$BOB":"refs/heads/$BOB
Bob can also push his updates to Alice's store:
$ cd ../bob-store
$ git push ../alice-store bob-frontier":"refs/heads/$BOB
If Bob cannot directly access Alice's store, he can instead package the updates into a bundle and send them through other means (sneakernet, email, broadcast, etc.):
$ git bundle create bob.updates refs/heads/$BOB # From bob-store
$ cp bob.updates ../alice-store
Alice can then fetch the updates from the bundle:
$ cd ../alice-store
$ git fetch bob.updates refs/heads/$BOB":"refs/heads/$BOB
If Bob knows that Alice's store already possess updates up to
BOB_FRONTIER_AT_ALICE
, Bob can create a smaller bundle with only the newer
updates:
$ cd ../bob-store
$ git bundle create bob.updates refs/heads/$BOB_FRONTIER_AT_ALICE".."refs/heads/$BOB
The messages of a log can be easily deleted from a store simply by removing all references to them then garbage collecting the store:
$ cd ../alice-store
$ LAST=$(git show refs/heads/$BOB --format=%H) # store last commit hash
$ git show $LAST
$ git checkout --orphan temp # move to a temporary branch with no commits
$ git branch -D $BOB # delete ref refs/heads/$BOB and logs/refs/heads/$BOB
$ git gc --prune=now # Remove now unreachable objects (garbage collect)
$ git show $LAST # Should return: "fatal: bad object ..."
Depending on the prior history of operations on the Git repository, there might still be references in git internals
(ex: .git/logs/HEAD
, reflog
, .git/FETCH_HEAD
, .git/ORIG_HEAD
, etc) which prevent some deletion. These
require a more careful cleanup. Once all references are removed, git gc
will then remove the objects from the object
database (.git/objects
).
Replication of updates, by push or pull, can happen between any pair of replicas as long as the following invariants are maintained:
- Single-writer: All commits reachable from the frontier of an author (
refs/heads/KEY
) shall be from the same author. This restriction is necessary because Git does not prevent another author from appending to a log and having a reference that points to that new commit. - No fork: No two commits reachable from any frontier, shall have the same parent commit. In other words, the commit history should form a strict linked-list.
This will guarantee eventual consistency between all replicas. In effect, the absence of forks enables the fast-forwarding replication mode of Git for all updates.
However, in the presence of incorrect or malicious participants, we cannot assume that they will maintain those properties as they might deviate arbitrarily from the expected protocol. For example, Eve might maintain one different log for each participant she is interacting with so that each of them thinks they are interacting with a valid log.
Let's see how this may play out with Git. For convenience, rather than retyping the full commands shown previously, we will use the following equivalent helper scripts:
bin/vanilla/create-store.sh PATH
: create a store atPATH
bin/vanilla/append.sh STORE KEY CONTENT
: appendCONTENT
inKEY
log withinSTORE
First, create a store for Eve:
$ bin/vanilla/create-store.sh eve-store
Then create a first message for Eve:
$ bin/vanilla/append.sh eve-store $EVE 'First Message'
Now create one message intended for Bob:
$ bin/vanilla/append.sh eve-store $EVE 'Message for Bob'
Track the log specifically for Bob:
$ cd eve-store
$ git update-ref refs/heads/Eve-for-Bob refs/heads/$EVE
Rewind Eve's log to the parent commit:
$ git update-ref refs/heads/$EVE $(git rev-parse refs/heads/$EVE~1)
Append a new message intended for Alice:
$ cd ..
$ bin/vanilla/append.sh eve-store $EVE 'Message for Alice'
Track the additional log specifically for Alice:
$ cd eve-store
$ git update-ref refs/heads/Eve-for-Alice refs/heads/$EVE
Now Eve can push different logs to Bob and Alice:
$ git push ../bob-store refs/heads/Eve-for-Bob":"refs/heads/$EVE
$ git push ../alice-store refs/heads/Eve-for-Alice":"refs/heads/$EVE
From within Bob's store or Alice's store, the log satisfies the single-write and no-fork properties:
$ cd ../bob-store
$ git log refs/heads/$EVE --topo-order --show-signature
$ cd ../alice-store
$ git log refs/heads/$EVE --topo-order --show-signature
If Eve had written that she was giving the same tokens to both Bob and Alice, both would now think they are the legitimate owner and Eve would have successfully double-spent the tokens. Moreover, the situation breaks eventual consistency because it is not possible for both Bob and Alice to obtain a correct log that is consistent between both of their stores. This also breaks the possibility to propagate Eve's updates directly between Bob and Alice, which effectively partitions the rest of the community according to which branch of the forked log they have first updated from:
$ git pull ../bob-store refs/heads/$EVE":"refs/heads/$EVE # from alice-store
From ../bob
! [rejected] 07318BA3E8FC2BBD468BC405A32F064B8DE7C8FF -> 07318BA3E8FC2BBD468BC405A32F064B8DE7C8FF (non-fast-forward)
At the very minimum we would like to be able to at least replicate all branches of forks. These can be stored, for example, under where we got them from:
$ # From alice-store:
$ git update-ref refs/remotes/eve-store/$EVE refs/heads/$EVE # Remember we got these updates from Eve's store
$ git fetch ../bob-store refs/heads/$EVE":"refs/remotes/bob-store/$EVE
We can test whether multiple log replicas are actually branches by counting how many of the branches' tips cannot be reached from another. If the number is greater than 1 then there are as many branches:
$ git merge-base --independent $(git for-each-ref "refs/remotes/*/$EVE" --format="%(refname)") | wc -l
There is also a nice visualization option for branches:
$ git log refs/remotes/eve-store/$EVE refs/remotes/bob-store/$EVE --graph --format=oneline
- Signing Git Commits with a SSH Key: https://calebhearth.com/sign-git-with-ssh, https://git-scm.com/docs/git-config#Documentation/git-config.txt-gpgprogram
- Signing Git Commits with GPG: https://mdleom.com/blog/2020/07/18/git-sign-commit-ed25519/
- Git transition plan to SHA256: https://git-scm.com/docs/hash-function-transition/
- Show disk usage for branches (Git >=2.31): https://github.com/git/git/commit/16950f8384afa5106b1ce57da07a964c2aaef3f7
- Latency numbers every programmer should know: https://gist.github.com/jboner/2841832
- Determining Updates to Transmit through a Connected Channel:
- Packfile Negotiation: https://git-scm.com/docs/pack-protocol/2.2.3#_packfile_negotiation
- Git Protocol v2: https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html
- Background Maintenance: https://git-scm.com/docs/git-maintenance
- Git Internals: Packed Object Store, Commit History Queries, File History Queries, Distributed Synchronization, Scalability
- Internal use of Vector Clocks (Corrected Commit Date): https://github.blog/2022-08-30-gits-database-internals-ii-commit-history-queries/#generation-number-v2-corrected-commit-dates
- Bloom Filters for File Path Changes: https://github.blog/2022-08-31-gits-database-internals-iii-file-history-queries/#changed-path-bloom-filters
- Christian Bundy, SSB over Git, %utuVSHMoayceJVEYM+xc7Rfys6fttvJP+XNd17V5+L8=.sha256