-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of static probability tables format #8
base: master
Are you sure you want to change the base?
Conversation
Let's start the conversation
The header: | ||
- specifies the kind of file; | ||
- references the grammar version; | ||
- optionally, references a SPT file it amends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this early stage, "delta" SPTs do not seem valuable to spend effort in speccing. They are off the fastpath anyway, and I can see them adding a lot of complexity to the spec. Let's leave the deltas until we actually feel we need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the highest priority, but let's keep an eye on the road :)
|
||
## Tables of Strings | ||
|
||
These tables add new strings that may be referenced both in the tables of probabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, the probability tables for strings just predict indexes into a move-to-front cache. We do not actually need to assign general probabilities to the string table itself - they will be predicted well after they are first referenced (and encoded using some varuint-encoding), and subsequently added to the MoveToFront String cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true for string literals, identifier names and property keys.
On the other hand, it's not true for interface names and string enums.
I'll amend the text to clarify.
TBD | ||
|
||
|
||
## Tables of Probabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify the specification of probability tables by specifying that independently. We know that each probability table will specify the probabilities for a finite and relatively "small" set of symbols.
For context-prediction of tree types, it's the set of schema-bounded types at that location. For string predictions, its the set {0, 1, .., N-1, MISS}
where N
is the size of the MoveToFront string cache, etc.
Each table can be encoded simply as a series of 32-bit integers, where the sum of all entries are guaranteed to be less than UINT32_MAX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get where anything is simplified.
Let's start the conversation