-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for skipping subtree revisions to increase read performance and reduce disk usage #3201
Conversation
MariaDB also documents the |
I think there's a way to get the benefits of this branch without the disruption of a table migration. Rather than remove the column, we could pick some very large revision number (larger than any currently existing one), and start writing / replacing all subtrees at that revision number. We could then modify the Doing this on an existing table would not reduce the cardinality of the existing index but would dramatically slow its growth, which could have a performance benefit, depending on whether the issue is the absolute size or the growth rate. |
I have the same intuition and imagined a similar solution. Such an elegant hack wouldn't be without risks, and would require a careful rollout. In particular a staged rollout where some old code and new code were live at the same time could really break things.
If you would be happy to work through these risks then I would be happy to think about this approach some more.
|
f5b22ff
to
d4caed0
Compare
As a data point, I've run the integration tests in CT-go against this version of Trillian and confirmed that it works: |
30db159
to
da7cf49
Compare
For anyone following along at home, the commit history got weird here. Somehow (maybe during a rebase?) I ended up setting the branch to an intermediate version of the work that was incomplete. Luckily I've abandoned the |
This is a minor cleanup but it simplifies google#3201 to avoid reading this, and pulling out this change to the existing logic into its own PR simplifies review.
There are a number of approaches that could be used to transition from the current world where subtrees are revisioned, to one where they are unrevisioned. I'll list them here: The breaking changeThis simply changes the schema to drop the revisions column and then updates the read/write queries. This is what the first commit to this PR does. This is the simplest thing to do, but it has no backwards compatibility for logs running on the older schema. This would need to be a Trillian 2.0. Different storage layerCopying the This involves a lot of duplicate code for an indeterminate amount of time. It also means that new flags need to be introduced such as Different code paths depending on tree featuresTrillian stores tree metadata that is read at the start of each tree operation. In this approach we store some marker in the tree metadata that indicates that this is a revisionless tree, and an absence of this marker means that it will be treated as a historic revisioned subtree log. This approach allows for backwards compatibility with existing logs, and a single instance of Trillian can support both revisioned and unrevisioned subtrees. Further, having this data in the database means that it's more difficult for a log operator to point the wrong configuration of Trillian at a database and potentially make incompatible writes by mixing revisioned / unrevisioned approaches on the same log. This approach has a little more complexity in terms of code, but simpler in terms of deployment. For this reason, this is the primary exploration path at this point. The current commit chain demonstrates one approach to doing this, where the marker is the presence of magic bytes '🙃norevs🙃' as the prefix of the tree description field. Other similar approaches involve hijacking the private/public key fields with magic bytes, as these are no longer used. A more elegant approach would be to add a new column to the |
This is a minor cleanup but it simplifies #3201 to avoid reading this, and pulling out this change to the existing logic into its own PR simplifies review.
09bc384
to
7b71e91
Compare
Just passing by here with some thoughts dump :) Would a blatant To make this column creation seamless, the In general, some seamless/automatic/stateful migration mechanism (built into the concrete Trillian storage implementations) would remove a lot of friction here, and would allow Trillian schema/code to iterate. For example, have some migrations versioning (like a simple counter), a table in the schema containing the "min supported version" which is bumped after each migration and ensuring all nodes know this version, and some form of election and progress tracking to facilitate these migrations. All done from within Trillian (without reliance on operators, manual steps, and external tooling). As an example, CockroachDB seamless migrations mechanism could be of interest, but may be a bit complex. |
I created three trees:
Ran small benchmark (
Getting rid of subtree revisions results in ~4.5x less rows in Subtree table. Benchmark results prove that old version is slower than a new one, while
|
@pavelkalinnikov Hello, and welcome back :-) We were at a conference last week, apologies for the slow reply. Replies are here, and I'll put them in line:
I considered this but ruled it out because I think we would still need to have manual steps for users that have deployed with a locked down configuration, i.e. where the DB user that Trillian connects as has data read/write permissions, but does not have permissions to modify the schema (i.e. no
+1 for a metadata table. I wished that we had one when I started this PR :-) Something to add to the roadmap, along with automatic upgrades, and documentation on the permissions that the Trillian user needs. That said, I'm still very nervous about even proposing this feature because this adds complexity that is somewhat hard to test, and if it goes wrong, logs don't support recovery from a backup. |
0fdc0ff
to
5696d6d
Compare
@n-canter suggested to use INSERT ... ON DUPLICATE KEY instead of REPLACE, which I have included. Thanks for the suggestion! My reply to this suggestion risks getting lost now that the comment is resolved, and I think it could be useful to future maintainers so I'll include it here: I have a minor concern that the VALUES() syntax is unfavourable in both MySQL[1] and to a lesser degree MariaDB[2] in the newest versions, but for the time being they both support it so let's roll with it. Tested this with the following docker images:
[1] https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html - "The use of VALUES() to refer to the new row and columns is deprecated beginning with MySQL 8.0.20, and is subject to removal in a future version of MySQL" |
I've convinced myself that the current approach is too leaky an abstraction and I'm going to put more work into this. Specifically, changing the description in I have some ideas about doing this in a cleaner way, and I'll take another run at this tomorrow. |
This removes an obstacle to executing google#3201 cleanly. Despite the duplication, it's also logically cleaner. The different backend implementations should be able to evolve independently. Tying them together with common code for reading rows forces the same schema layout across all implementations.
02e48ff
to
c5bff06
Compare
Latest commits land some big changes that rework how the new settings bit is stored. The main deltas:
It's unfortunate how messy this is, but I'm pointing the finger at protos, especially the any type. It also has an extra complexity that we can't write protos directly to the DB because it's too dangerous. An encoded proto with all default values is an empty message, which appears to be exactly the same as no message when we read it back. In this case, we really care about this distinction. To work around this, a new struct has been introduced and it is gob encoded/decoded into the PublicKey column. @AlCutter if you can review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone else reviewing this, I've found it useful to break this review in two halves since some changes were done and undone by some commits: the first commit, and all the other ones together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. Addressed, and still eagerly courting more eyes and opinions.
This removes an obstacle to executing google#3201 cleanly. Despite the duplication, it's also logically cleaner. The different backend implementations should be able to evolve independently. Tying them together with common code for reading rows forces the same schema layout across all implementations.
260ccff
to
6b26d4e
Compare
This removes an obstacle to executing #3201 cleanly. Despite the duplication, it's also logically cleaner. The different backend implementations should be able to evolve independently. Tying them together with common code for reading rows forces the same schema layout across all implementations.
6b26d4e
to
a3f3528
Compare
This was previously creating and initializing a tree, and then testing what happened if you created a transaction on a tree ID that definitely didn't exist. What it was trying to test was something different, which is the case where a tree had been created/defined, but was not initialized with an empty log root yet. The test now reflects that. This allows google#3201 to avoid a nil check for something that otherwise will be guaranteed to exist.
This was previously creating and initializing a tree, and then testing what happened if you created a transaction on a tree ID that definitely didn't exist. What it was trying to test was something different, which is the case where a tree had been created/defined, but was not initialized with an empty log root yet. The test now reflects that. This allows #3201 to avoid a nil check for something that otherwise will be guaranteed to exist.
a3f3528
to
0e02c59
Compare
This confirms that removing this will actually work in practice. This commit is a breaking change and can't be merged as-is. The lesson to be learned is that this is the minimum change that would be required to make this work if there were no existing clients using Trillian.
The same schema is used for both revisioned and unrevisioned subtrees. The difference is that we always write a revision of 0 in the unrevisioned case, which still means that there will only be a single entry per subtree. The old PublicKey column has been used to store a settings object for newly created trees. If this settings object is found in this column, then it will be parsed and checked for a property indicating that this tree is revisionless. If the property is successfully confirmed to be revisionless then all writes to the subtree table will have a revision of 0, and all reads will skip the nested inner query that was causing slow queries. If the property cannot be confirmed to be revisionless (no settings persisted, or settings persisted but explictly say to use revisioned), then the functionality will continue in the old way. This preserves backwards compatibility, but makes it so that new trees will gain these features. For users with legacy trees that wish to take advantage of the smaller storage costs and faster queries of the new revisionless storage, the proposed migration mechanism is to use migrillian to clone the old tree to a new tree. If anyone is interested in doing this then I recommend speaking to us on Slack (https://join.slack.com/t/transparency-dev/shared_invite/zt-27pkqo21d-okUFhur7YZ0rFoJVIOPznQ).
0e02c59
to
13d5d31
Compare
I've squashed the history for this PR into 2 commits. The first of these shows what the real change is to support the new revisionless approach, but does so in a breaking way. The second change makes it so that all new trees get the new revisionless behaviour, but old trees continue to work exactly as they used to. |
It's a big one
Just to confirm that removing this will actually work in practice.
The tests pass, so now we need to design an upgrade mechanism.
Probably forking this mysql driver / schema to be a mysql2 which is
effectively a different driver.
We'll also need to make some changes to this to support MariaDB.
The 'REPLACE INTO' syntax is, I'm pretty sure, a MySQL specfic
extension. There is a more standard format for doing this, but it
involved passing in more '?' parameters which was already getting
too wild for me with the
<placeholder>
expansion. Probably asolveable problem, but one for when I'm a bit less ill :-)