In the drive for TestNet, certain sacrifices are made as we work. This file tracks ideas for future work, clean up, performance improvements, etc., that can be considered once a TestNet-ready Chainweb has been achieved.
Consider caching the lookups made over shared calls to seekHighest
.
seekHighest
is still very fast (O(logn)
), but for very long chains this
could show a performance improvement for entries
calls.
Currently, tag filenames take the following shape:
0000000000000020.Hc-ixI8lSPr39uzlsO_qbFv52veiJhlL_ozLY1F_2hY
0000000000000040.8lqxArVy0fw_mv5LPv94Qrek0DT8zL5miH_LF9mCvRM
0000000000000080.H_hTtC-oj8KpiY5weioJj6Tkn4pI-6EnIvwelfrQWO8
0000000000000100.PwntUkxmI2pTGlI0N9Gdcwlhb94WSdO9HY_CQXtyaoU
...
where the value before the .
is a fixed-width Hex encoding of BlockHeight
.
The value after the .
is a base64 URL-safe encoding of BlockHashBytes
. This
format ensures that:
- git will store these filenames in the correct order (lowest-to-highest) within its "tree entries"
BlockHashBytes
can encoded in a way that git considers "legal" for filenames
Unfortunately, the leading zeros consume bytes on disk that may not be necessary. Some encoding other than Hex may be better for this.
Futhermore, during tag lookups (which we do often), matched tags must have the
BlockHashBytes
portion of the filename decoded in order to become useful
again. The target hash must also be reencoded once to even ensure a match in the
first place. Suffice to say, a lot of redundant encoding/decoding occurs. Both
lookups and traversal would receive a substantial performance improvement (50%?)
if BlockHashBytes
were pre-encoded in their URL-safe forms, as lookups here
could then occur without any extra encoding/decoding.
hlibgit2
binds to version
0.18.0
of libgit2
, which was released in mid-2013. Since then, the following
improvements (which are relevant to our use-case) have been made:
Version | Date | Improvements |
---|---|---|
0.19.0 | Jul 2013 | Bug fixes, perf improvements |
0.20.0 | Nov 2013 | Perf improvements |
0.21.2 | Oct 2014 | Fixed mem leak in git_tag_delete() |
0.22.0 | Jan 2015 | Changes to names of functions we use |
0.24.0 | Mar 2016 | Tree objects assumed sorted (perf ++) |
0.24.1 | Apr 2016 | Change to structure of git_tree type |
0.27.3 | Jul 2018 | Fix out-of-bounds reads from a packfile |
0.24.0
in particular may give us a nice improvement "for free".
The complexity of leaves
(and therefore leafEntries
) is "O(n)
-ish", in
that, due to a detail of the libgit2
library, the number of tags in bh/
affects the speed at which we can read the tags in leaf/
. Ideally, fetching
leaves would be O(1)
. To restore this, a Map
of the current leaves could be
kept in memory (say within GitStoreData
), where it would be updated as leaves
were added to or deleted from leaf/
. leaves
then becomes a toList
of the
Map
, which would speed up a few operations elsewhere that depend on it.
Using walk
and seekHighest
, a faster implementation of branchEntries
could
probably be written. Currently we rely on the default provided by TreeDb
,
which experimentally has been shown to be "fast enough".
Currently prune
will perform a git repack -A
and git gc
, but does not
manually clean out old tags in bh/
or leaf/
.
See this discussion for more information.
This paragraph only concerns the proper execution schedule of the on-disk check-pointer. Figure out how to catch exceptions properly in monad transformer stacks that have IO at the bottom. Upon receipt (or catch) of an exception during the execution of pact code, make sure to close any back-end database, currently in use, properly.