By Tom Marble
Introducing the Firefly Trust Sync (Firefly) architecture as a decentralized, web-of-trust alternative to address the shortcomings of the Certificate Authority (CA) based Public Key Infrastructure (CA-based PKI) and the Pretty Good Privacy (PGP) web-of-trust. Self sovereign identity is a cornerstone of this architecture and yet it does not rely whatsoever on distributed ledger technology. Essential design elements are presented with initial thoughts on both advantages and disadvantages of this approach as well as some next steps.
The success of the Internet has been based on the independence and asynchrony of participants growing organically. The Domain Name Service (DNS1) is one of the essential Internet services that functions as a federation of decentralized servers which has tracked the growth of the Internet itself. In first wave of e-commerce a solution was needed for consumers to safely transmit their credit card information to merchants and that drove the development of Secure Sockets Layer (SSL) – later Transport Layer Security (TLS) – and the CA-based PKI. The PGP Web of Trust is an effort to leverage public key cryptography for a slightly more generalized model of trust. And yet all of these technologies are succumbing to growing pains or outright attacks.
The name Firefly evokes one of many biomimetic lessons we may discover in nature2.
The Firefly architecture proposes to address these problems:
The primary problem with the CA-based PKI is the prevalence of Man in the Middle attacks (MiTM3). There have been a series of band-aid measures like Extended Validation certificates (now seen as a failure4) and Certificate Transparency5 yet the audits can't scale (directly), it has time windows of vulnerability, it may leak browsing information and it doesn't have a standard user experience (UX) in case of auditing failure6.
The network of automatically synchronized public key servers7 has worked remarkably well despite some unfortunate design choices that the identities (UID's which are often e-mail addresses) are completely exposed and anyone can attach certifications to a key. Largely due to UX failures the convenience of using short ID's led to the Evil-32 attack8. And to make matters worse now we are now seeing certificate flooding9.
Despite the availability of core technologies to send PGP encrypted e-mail it simply has failed due to incredible complexity and absolute lack of UX design. In it's place are a myriad of ad-hoc solutions – often involving some poor messaging implementation on a website which is "secured" by TLS10.
Firefly architecture aspires to the following goals:
Validating TLS certificates in Firefly could provide additional assurance to Certificate Transparency Auditing and/or offer an alternative verification path for certificates that do not participate in the CA-based PKI.
Perhaps the most common use of the PGP Web of Trust is to verify the public key of an e-mail correspondent with whom we would like to share secure messages. Firefly could replicate this functionality without some of the design shortcomings of PGP network while providing and countermeasures for known attacks.
While it is fashionable for self-sovereign digital identity solutions to rely on distributed ledger technology11 Firefly offers an alternative which will not exacerbate climate change12.
The core principle of Firefly is that each identity owner shall have exclusive control of their private key material. The impact of this design choice is that if the proposed pre-planned key backup and recovery procedures are not followed there is literally no one to call – the identity will be lost. And yet this is the ultimate way to ensure against abuse by third parties – including services offering convenience via key material management or escrow. This creates an additional incentive to take proper care of key material.
Software implementations of Firefly will be built "on the shoulders of giants" as is most software. Core key management will be handled with Steve Gibson's public domain Secure Quick Reliable Login (SQRL13) technology. The essential concepts leveraged from SQRL include:
SQRL is designed around a master password based Password Based Key Derivation Function (PBKDF) which generates an Ed25519 Elliptic curve public key pair14 for every website (identity client) requiring authentication.
This leads to one (of many) SQRL advantages in that disparate websites cannot collude to merge identities as the public key for each site is unique. And it obviates using a password manager as the master password is all that is needed to access the keypair for a given client.
The master SQRL identity information can be represented in a few hundred bytes and is designed to printed on paper as QR Code for ease of machine scanning (which is useful in the case of sharing an identity between multiple devices). The offline backup is encrypted using a 24-digit decimal Rescue Code which, obviously, should be stored separately.
Firefly extensions involve using SQRL deliberately for machine-to-machine communication (in addition to human user to machine communication) as well as supporting a superset of the SQRL API for web-of-trust operations.
Any alternate web-of-trust solution could not be "self hosted" initially and thus Firefly will need to begin (at least) by relying on existing DNS and CA-based PKI solutions. Even if self hosting becomes feasible it makes much more sense to use the existing DNS and CA-based PKI as additional inputs to the trust heuristic (i.e. the answers may provide fascinating telemetry that might suggest that a MiTM is being attempted, for example).
To the degree there is concern about DNS accuracy and/or privacy an interim solution may involve using DNS over HTTPS (DOH15).
Fortunately it is quite easy now to automate creation of Let's Encrypt16 TLS certificates.
Each Firefly identity node maintains a set of trust assertions that may be queried. Unlike the Secure Keyserver network there is no attempt to constantly replicate the entirety of the trust database everywhere. Based on the the type of the query it may be delegated to peers and coalesce responses for for the caller. In this way the query function will move to where the data is located (not vice versa). As the web-of-trust is inherently a social graph it follows that the data should be organized as a graph database17. And recursive queries in a graph database can have a significant performance advantage in a graph database vs. a relational database18.
As each identify requires a server the Firefly architecture begins at the maximum possible level of distribution. Just as a well known public key has multiple signatures a well known identity in the Firefly network will have trust assertions on multiple Firefly servers. In this way the network is resilient against node failures.
In creating a query framework intended to gather multiple data points as input to trust heuristics Firefly can weigh the importance of multiple factors including: social proximity, number of positive assertions, number of revocations, age of as assertions, input from classic DNS and/or CA-based PKI or the PGP Web of Trust. Each heuristic may require differing dimensions of consensus depending on the sensitivity of the decision. In the web-of-trust we don't expect the simplicity of "one" answer and yet the signal of consensus will outweigh the noise when an assertion is trustworthy.
Fortunately there is a wealth of resources in supporting graph database design. By adding time to the traditional entity–attribute–value model19 it is possible to enable queries over time (and presents an opportunity to take advantage of immutability20). The EAVT schema may be more like the Resource Description Framework (RDF21) Schema22 or more like other popular EAVT schemas23.
The implementation of the query language ("Firefly Script") takes inspiration from Datalog24 and SPARQL25. Especially apropos for running queries across the Firefly network is the SPARQL Federated Query26 standard and implementations27.
As each Firefly public key is scoped to the context of a given identity server (as inherited from SQRL) each assertion is, by default, anonymous. Each Firefly user can decide the degree to which their public key on any node may be associated with other identities and/or metadata. In this way a public key may be associated with a traditional UID (e.g. e-mail address, service URL) or remain opaque.
As noted in the paper on Pet Names28 "Zooko's Triangle29 tells us that names can have two out of three properties: decentralized, globally unique, human meaningful."
The domain name of a Firefly server (the scope) and a user public key within that server acts as a globally unique identifier. By design it is decentralized, but alas not human meaningful.
Thus one type of trust assertion in Firefly is the association of a Pet Name to Firefly identity giving us hope for human readability and verification.
In some cases it may be desirable for Firefly queries to identify their provenance (the requesting server), but this is not strictly required. By leveraging work on federated queries and existing Internet Protocol algorithms (e.g. Time-To-Live) one can imagine constructing a layered approach to querying. At each hop in the query a Firefly server is only aware of the identity of the client and does not need to reveal the peers to which it will forward the query. Given a minimum number of hops this may provide TOR30 like query anonymity.
The following use cases are examples of how the Firefly Trust Sync architecture will operate:
As with many distributed networks participation starts with with an invitation. Firefly's invitation process is inspired by, but hopefully more simple than that of Secure Scuttlebutt31.
To explain how Alice will invite Bob to her network these mnemonics will be used:
- AD : Alice Domain = https://alice.rocks
- AK : Alice private key
- Ak : Alice public key
- BD : Bob Domain = https://bob.rocks
- BK : Bob private key
- Bk : Bob public key
The invitation steps are:
- Alice sends Bob an invite code (in the EAVT schema): AD Ak nonce-signed-by-AK
- Bob verifies that nonce-signed-by-AK matches Ak
- Bob replies by authenticating to AD thus creating BK*/*Bk
- Bob sends a message to Alice: BD Bk nonce-signed-by-BK
- Alice verifies that nonce-signed-by-BK matches Bk
- Alice (AD : Ak) is now connected to Bob (BD : Bk)
On Alice's Firefly identity server she wants to assert that she trusts Bob's identity by entering a fact in the graph database:
- Alice inserts an assertion (this example inspired by RDF where
aid is an unique assertion id):
- aid, wot:truster, Ak
- aid, wot:id, Bk
- aid, wot:sig, AK-signs-Bk
We would like to use Firefly as verification for web site certificates.
Now adding the following mnemonics:
- W : Website URL
- Wd : The TLS certificate digest for W
Bob wants to publish an assertion that he trusts a given website's certificate:
- Bob inserts an assertion (on BD)
- aid, wot:truster, Bk
- aid, wot:web, W
- aid, wot:webcert, Wd
- aid, wot:sig, BK-signs-W-Wd
Given this information in the Firefly database we can propose an extension to Monkeysphere32 to query Firefly in addition to (or as a replacement to) the PGP Web of trust.
In the same way that Monkeysphere can improve the security and user experience for both ssh server administrators and users33 we could propose extensions to Monkeysphere for server identities.
Here the assertion entered into the Firefly database is analogous to that for a web server above (and either may allow specifying a non-standard port).
By selecting an e-mail as a Pet Name and making trust assertions about it two (or more) correspondents may trust they have the proper public with which to encrypt secure e-mail for each other.
If Alice wants to associate the e-mail [email protected] with BD : Bk she can insert this fact in the database:
- Alice trusts Bob's e-mail
- aid, wot:truster, Ak
- aid, wot:firefly, BD
- aid, wot:id, Bk
- aid, wot:email, [email protected]
- aid, wot:sig, AK-signs-BD-Bk-email
Alice knows, then, she can encrypt e-mail for Bob, and (through analogous steps) Bob knows he can encrypt e-mail for Alice [email protected] with AD : Ak.
At this point the secure mail exchange approach of Autocrypt34 could be used for Alice and Bob to communicate securely.
As the database records a timestamp with every entry (the T in EAVT) revoking a previous trust assertion is as easy as inserting a new fact with the attribute wot:revoke in assertions where wot:sig had been used.
Here are some advantages of the Firefly Trust Sync architecture:
Considering Firefly as a distributed, secure database can see the immense benefits over using the blockchain as a database35.
Would you use a database with these features?
- Uses approximately the same amount of electricity as could power an average American household for a day per transaction
- Supports 3 transactions / second across a global network with millions of CPUs/purpose-built ASICs
- Takes over 10 minutes to "commit" a transaction
- Doesn’t acknowledge accepted writes: requires you read your writes, but at any given time you may be on a blockchain fork, meaning your write might not actually make it into the “winning” fork of the blockchain (and no, just making it into the mempool doesn’t count). In other words: “blockchain technology” cannot by definition tell you if a given write is ever accepted/committed except by reading it out of the blockchain itself (and even then)
- Can only be used as a transaction ledger denominated in a single currency, or to store/timestamp a maximum of 80 bytes per transaction
But it’s decentralized!
In making trust assertions by writing to a distributed graph database Firefly is much more energy efficient that having multiple computers sweat to brute force a one way function36:
By late next year, bitcoin could be consuming more electricity than all the world’s solar panels currently produce – about 1.8 percent of global electricity. That would effectively erase decades of progress on renewable energy.
Given the urgency of global climate change we cannot, in good conscience, advocate an alternative web-of-trust solution which is on track to consume 32 Terawatthours (TWh) of energy37.
Some might advocate that "proof of stake"38 blockchain technology won't suffer from some of the problems of "proof of work" technology39.
And yet, in practice, it would seem that there is still a cabal of minters that have asymmetric community power to create Initial Coin Offerings (ICO's) and reap the benefits of the droit du Seigniorage40.
In Firefly there is no dependence on any one specific, occult blockchain network.
Firefly users are in complete control of their private key material and do not need to trust any third party "escrow" service.
As each identity is represented by a Firefly node (maximal distribution) and widely held trust assertions are dispersed across the network it is tolerant of temporary identity server failure.
As only trusted identities can make assertions the dispersion of naively "astroturfing" facts is limited.
Firefly makes Pet Name style associations on a voluntary basis by starting at a default of anonymity.
Here are some disadvantages of the Firefly Trust Sync architecture that need further consideration:
While each Firefly identity does not require a physical "server" per se it does require it's own domain name, TLS certificate, and web service API implementation.
The cost of asserting identities in Firefly is much higher than in the current PGP Web of trust. While this may have a benefit of discouraging the propagation of "noise", it is yet a barrier to adoption.
One of the failures of the PGP Web of Trust was the utter lack of user experience design. While the building blocks of SQRL, Monkeysphere and Autocrypt have made great progress in addressing parts of the UX challenge, Firefly will need comprehensive UX design to successful
Further discussion and analysis of the Firefly Trust Sync architecture is welcome. Here are a few important next steps to consider:
There is a certain engineering elegance to an immutable database and "append only" logs have significant benefits from a security auditing point of view. But what do we do when the "nuclear launch codes" get accidentally added to the database. Or what about very offensive speech?
An early question to resolve is to determine if there's any way to get the security (and other benefits) of an immutable database41 and yet be in compliance with Europe's General Data Protection Regulation (GDPR42)?
There's nothing like working code to demonstrate that an architecture design is plausible. An essential next step will be to build open source prototypes of Firefly, including necessary extensions to SQRL, Monkeysphere and Autocrypt.
Given a prototype Firefly implementation it will be possible to test common operations under synthetic load to determine likely performance characteristics.
Throughout the discussion of Firefly the focus has been about trusting the data in the graph database. Clearly Firefly needs to enable users to trust the software that implements the system as well.
There may be an opportunity to make trust assertions about the reproducibility43 of Firefly software such that users chain trust the "entire chain of custody" of a trust assertion (including the software).
A basic threat assessment must be done to elaborate expected attacks, such as denial of service.
For each of these threats a mitigation or countermeasure should be proposed.
Given a basic implementation of Firefly there are some interesting future possibilities to consider:
- Could Firefly help improve the trust of Internet of Things (IoT) devices?
- Could small IoT devices be represented by a Firefly proxy?
- Would it be possible to support TOR-like anonymous queries?
- Could Firefly become the basis for a generalized trust API44?
1 Domain Name System
https://en.wikipedia.org/wiki/Domain_Name_System
2 SYNC: The Emerging Science of Spontaneous Order, by Steven H. Strogatz
http://www.stevenstrogatz.com/books/sync-the-emerging-science-of-spontaneous-order
3 New Research Suggests That Governments May Fake SSL Certificates
https://www.eff.org/deeplinks/2010/03/researchers-reveal-likelihood-governments-fake-ssl
4 Extended Validation Certificates are Dead
https://www.troyhunt.com/extended-validation-certificates-are-dead/
5 Certificate Transparency Version 2.0
https://tools.ietf.org/id/draft-ietf-trans-rfc6962-bis-30.html
6 How will Certificate Transparency Logs be Audited in Practice?
https://www.agwa.name/blog/post/how_will_certificate_transparency_logs_be_audited_in_practice
7 Secure Keyserver Network
https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home
8 Evil-32
https://evil32.com/
9 Community Impact of OpenPGP Certificate Flooding
https://dkg.fifthhorseman.net/blog/community-impact-openpgp-cert-flooding.html
10 Modern Alternatives to PGP
https://blog.gtank.cc/modern-alternatives-to-pgp/
11 Decentralized Identifiers (DIDs)
https://w3c-ccg.github.io/did-spec/#introduction
12 Bitcoin’s energy usage is huge – we can't afford to ignore it
https://www.theguardian.com/technology/2018/jan/17/bitcoin-electricity-usage-huge-climate-cryptocurrency
13 Secure Quick Reliable Login
https://www.grc.com/sqrl/sqrl.htm
14 SQRL Detailed Cryptographic Design
https://www.grc.com/sqrl/crypto.htm
15 DOH: DNS over HTTPS has mitigation
https://en.wikipedia.org/wiki/DNS_over_HTTPS
16 Let's Encrypt
https://letsencrypt.org/
17 Graph Database
https://en.wikipedia.org/wiki/Graph_database
18 Neo4j is faster than MySQL in performing recursive query
https://maxdemarzi.com/2017/02/06/neo4j-is-faster-than-mysql-in-performing-recursive-query/
19 Entity–attribute–value model
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
20 The rise of immutable data stores
https://usblogs.pwc.com/emerging-technology/the-rise-of-immutable-data-stores/
21 Resource Description Framework
https://en.wikipedia.org/wiki/Resource_Description_Framework
22 RDF Schema
https://en.wikipedia.org/wiki/RDF_Schema
23 Unofficial guide to Datomic internals
https://tonsky.me/blog/unofficial-guide-to-datomic-internals/
24 Datalog
https://en.wikipedia.org/wiki/Datalog
25 SPARQL
https://en.wikipedia.org/wiki/SPARQL
26 SPARQL 1.1 Federated Query
https://www.w3.org/TR/sparql11-federated-query/
27 Blazegraph Federated Query
https://wiki.blazegraph.com/wiki/index.php/FederatedQuery
28 Pet Names
https://github.com/cwebber/rebooting-the-web-of-trust-spring2018/blob/petnames/draft-documents/making-dids-invisible-with-petnames.md
29 Zooko's Triangle
https://en.wikipedia.org/wiki/Zooko%27s_triangle
30 TOR
https://www.torproject.org/
31 Secure Scuttlebutt
https://ssbc.github.io/scuttlebutt-protocol-guide/
32 Monkeysphere
http://monkeysphere.info/
33 Monkeysphere for SSH
http://monkeysphere.info/getting-started-ssh/
34 Autocrypt
https://autocrypt.org/
35 On the dangers of a blockchain monoculture
http://tonyarcieri.com/on-the-dangers-of-a-blockchain-monoculture
36 New Study of Bitcoin’s Energy Use Makes You Libertarian Nerds Look Even Worse Than Usual
https://www.motherjones.com/environment/2018/05/new-study-of-bitcoins-energy-use-makes-you-libertarian-nerds-look-even-worse-than-usual/
37 The Environmental Case Against Bitcoin
https://newrepublic.com/article/146099/environmental-case-bitcoin
38 Avalanche (AVA) — Blockchain 3.0: A Novel Metastable Consensus Protocol
https://hackernoon.com/avalanche-ava-blockchain-3-0-a-novel-metastable-consensus-protocol-28cdc4ee8984
39 Scalable and Probabilistic Leaderless BFT Consensus through Metastability
https://arxiv.org/abs/1906.08936
40 Seigniorage
https://en.wikipedia.org/wiki/Seigniorage
41 Append-only databases and the GDPR conundrum
https://www.bloorresearch.com/2018/02/append-databases-gdpr-conundrum/?cn-reloaded=1
42 GDPR: What your company should know and do, starting now
https://medium.com/wattx-stories/gdpr-what-your-company-should-know-and-do-starting-now-f62d70f72d7e
43 Reproducible Builds
https://reproducible-builds.org/
44 Fixing Trust on the Internet
https://libreplanet.org/2017/program/#day-2-timeslot-14-session-2-collapse
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.