-
Notifications
You must be signed in to change notification settings - Fork 2
Home
The framework is undergoing the world's slowest redesign. I'll try this wiki updated with my progress.
- Add support for more than one type of CRDT in the graph (e.g. text documents).
- Require databases to define a schema.
- Replace read & write APIs with bulk-only implementations.
- Add support for ordered lists of unbounded size (CouchDB-style views).
- Distinguish reads from subscriptions so memory doesn't grow unbounded.
This should probably be a blog instead.
I'm still fascinated by this space, but my interests have grown to encompass more than a human can reasonably manage. This has become another side project in the "to be completed" pile. I'm sure I'll stay involved and curious, but I'm afraid I may never devote the same amount of time again. Anyone reading this probably knows the feeling.
I'm not sure why I keep writing these updates. Nobody reads them except me. Maybe it's nostalgia.
Apparently these moments of inspiration only come annually. Shortly after my last update I attempted the simpler version of Timber without checkpointing or compaction, what I call Woodland, but it quickly got complicated. It was the type of complexity that grows from a poor foundation. I was still missing some underlying principle of simplicity.
Recently I watched a programming talk by Pat Helland titled Immutability Changes Everything. It was an excellent presentation and got me thinking how Woodland could be restructured as a series of immutable data structures. After a few days of massaging the idea, everything fell into place. I'd finally discovered the underlying principle that was holding me back.
I hope to have this published as a small library in the next two months.
I've made the algorithm public: https://github.com/PsychoLlama/timber-protocol
It's complex enough that I'm not sure I can actually implement it on my own. I've failed a few times trying. It could be made a lot simpler without checkpointing or compaction so I might try my hand there instead, just as a proof of what it could do. Who knows. It takes so long for these projects to reach completion, maybe it's enough that the algorithm is out there.
Whew, it's been nearly a year. Since the last update, life got busy again. I ended up quitting my job and starting a sabbatical. My attention has been focused on infrastructure, computer hardware, and diving deeper into linux. So not much software.
Yesterday had a breakthrough moment. A nagging thought about mixing paxos with vector clocks... I've got an algorithm written down. There's a chance this might really work. I'm talking about a transaction log with compaction and checkpointing. It could scale above thousands of peers, they could go offline, continue editing, and still resynchronize without blocking compaction or losing data.
Or maybe I've missed something obvious and it could never work in practice. I'll do some more tests and post the algorithm when I'm more certain.
Things have started to calm down again. I'm returning my attention to learning formal proof systems and solving convergent op-log compaction. I'm planning a few side projects with OCaml since ML underpins many proof assistants (Isabelle with Standard ML, Coq with OCaml). Developing an intuition seems immediately useful. And fun 🙂
No commitments, but I've also considered implementing a proof of concept ORDT to make the problem of op-log compaction more concrete.
Life got complicated. COVID-19 inflated the sense of urgency at my day job. I've been working on video conferencing applications for the last 2 years. You can imagine how that's going.
I don't have time for anything else. I hope life calms down soon. It probably won't.
I keep thinking about operation logs. Since globally consistent compaction is a no-go, the only thing left is locally consistent compaction. The biggest question is how to handle compaction conflicts between peers. I depend on metadata in order to merge replicas, but the whole point of compaction is deleting that metadata. It's like merging two git histories without a shared branch.
Some ideas have been percolating. I've written about them in the Operation Log Compaction article. Nothing solid, just ideas.
I just came out of a deep rabbit hole researching operation logs as a potential replacement for the LWW-E-Set, aka the Register
. Theoretically it would allow foreign key constraints and true node deletion. The rabbit hole ended when I realized operation log compaction implied classical global consistency. Still, it was useful knowledge.
I'm currently experimenting with Isabelle, a machine-checked proof assistant. I hope to formally prove that types and migrations won't break the Strong Eventual Consistency guarantees of CRDTs. Otherwise the whole premise of my redesign is broken 🙈