Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of static probability tables format #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Yoric
Copy link
Contributor

@Yoric Yoric commented Oct 10, 2018

Let's start the conversation

The header:
- specifies the kind of file;
- references the grammar version;
- optionally, references a SPT file it amends.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this early stage, "delta" SPTs do not seem valuable to spend effort in speccing. They are off the fastpath anyway, and I can see them adding a lot of complexity to the spec. Let's leave the deltas until we actually feel we need them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the highest priority, but let's keep an eye on the road :)


## Tables of Strings

These tables add new strings that may be referenced both in the tables of probabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the probability tables for strings just predict indexes into a move-to-front cache. We do not actually need to assign general probabilities to the string table itself - they will be predicted well after they are first referenced (and encoded using some varuint-encoding), and subsequently added to the MoveToFront String cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for string literals, identifier names and property keys.

On the other hand, it's not true for interface names and string enums.

I'll amend the text to clarify.

TBD


## Tables of Probabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify the specification of probability tables by specifying that independently. We know that each probability table will specify the probabilities for a finite and relatively "small" set of symbols.

For context-prediction of tree types, it's the set of schema-bounded types at that location. For string predictions, its the set {0, 1, .., N-1, MISS} where N is the size of the MoveToFront string cache, etc.

Each table can be encoded simply as a series of 32-bit integers, where the sum of all entries are guaranteed to be less than UINT32_MAX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get where anything is simplified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants