Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: grammar helper #85

Closed
mindplay-dk opened this issue Aug 12, 2023 · 11 comments · Fixed by #93
Closed

Feature: grammar helper #85

mindplay-dk opened this issue Aug 12, 2023 · 11 comments · Fixed by #93
Assignees
Labels
A-misc Area: Issues that don't fint into specific category C-feature-accepted Category: A feature request that has been accepted and awaiting implementation

Comments

@mindplay-dk
Copy link
Contributor

What problem does this feature solve?

This is mainly for convenience - but it does solve the problem with the existing defer function, which relies initialization separate from creation. My proposed grammar helper would be statically type-checked, and wouldn't need an error-handler.

Describe the solution

If we look at the example for defer:

interface NumberNode {
type: 'number'
span: Span
value: number
}
interface ListNode {
type: 'list'
span: Span
value: Array<NumberNode | ListNode>
}
const TupleList = defer<ListNode>()
const TupleNumber = defer<NumberNode>()
TupleNumber.with(
map(
integer(),
(value, span) => ({ type: 'number', span, value })
)
)
TupleList.with(
map(
takeMid(
string('('),
sepBy(choice(TupleList, TupleNumber), string(',')),
string(')')
),
(value, span) => ({ type: 'list', span, value })
)
)

Here is that example implemented with the grammar helper:

  const tupleGrammar = grammar({
    tupleNumber(): Parser<NumberNode> {
      return map(
        integer(),
        (value, span) => ({ type: 'number', span, value })
      )
    },
    tupleList(): Parser<ListNode> {
      return map(
        takeMid(
          string('('),
          sepBy(choice(this.tupleList, this.tupleNumber), string(',')),
          string(')')
        ),
        (value, span) => ({ type: 'list', span, value })
      )
    },
  });

Here is a test demonstrating how to use the resulting grammar:

  is.equal(
    run(tupleGrammar.tupleList).with('(1,2,(3,(4,5)))'),
    {
      isOk: true,
      span: [ 0, 15 ],
      pos: 15,
      value: {
        type: 'list',
        span: [ 0, 15 ],
        value: [
          { type: 'number', span: [ 1, 2 ], value: 1 },
          { type: 'number', span: [ 3, 4 ], value: 2 },
          {
            type: 'list',
            span: [ 5, 14 ],
            value: [
              { type: 'number', span: [ 6, 7 ], value: 3 },
              {
                type: 'list',
                span: [ 8, 13 ],
                value: [
                  { type: 'number', span: [ 9, 10 ], value: 4 },
                  { type: 'number', span: [ 11, 12 ], value: 5 }
                ]
              }
            ]
          }
        ]
      }
    }
  );

And here is my preliminary implementation:

import { Parser } from "@nrsk/sigma";

type Grammar<T> = {
  [P in keyof T]: T[P] extends () => any ? ReturnType<T[P]> : never;
};

type GrammarInit<T> = T & ThisType<Grammar<T>>;

type GrammarType = {
  [name: string]: () => Parser<any>;
};

export function grammar<T extends GrammarType>(init: GrammarInit<T>): Grammar<T> {
  const grammar = {} as { [key: string]: Parser<any> };

  const initialized = {} as { [key: string]: true };
  
  for (const key in init) {
    grammar[key] = {
      parse(input, pos) {
        if (! initialized[key]) {
          initialized[key] = true;

          grammar[key] = (init[key] as any).apply(grammar);
        }
        
        return grammar[key].parse(input, pos);
      },
    } as Parser<any>;
  }

  return grammar as Grammar<T>;
}

Here is a screenshot demonstrating IDE support:

image

As you can see, this works with the circular references, which is possible with the magical ThisType in TS.

It's doing basically the same thing as defer for each member, so of course this works with circular references at run-time as well.

I didn't benchmark it against defer, and it might need some optimization, and the types could probably use a little work.

But what do you think, would you welcome a PR for this feature? 🙂

@norskeld norskeld self-assigned this Aug 12, 2023
@norskeld norskeld added C-feature-request Category: A feature request, i.e. not implemented A-misc Area: Issues that don't fint into specific category labels Aug 12, 2023
@mindplay-dk
Copy link
Contributor Author

I wanted to know if this approach was faster or slower - it is slower, and I did attempt some optimizations, but I'm not an expert, and the benchmark is really unreliable on my system for some reason... I'm seeing anywhere from 1% to 10% overhead compared with defer, so I really have no idea if any of the optimizations I tried were making things faster or slower - so I left it the way it is for now.

On this branch, you can also see what the JSON parser would look like in this format.

mindplay-dk/sigma@add-core-folder...mindplay-dk:sigma:grammar-helper

I quite like the encapsulation, the type-safety, and removing the potential for uninitialized parsers - but I'm not sure what the actual overhead is, and so I can't really say if I think it's worth while, or whether it could be optimized.

There might be some inherent overhead due to the use of this in the JSON parser implementation itself, not sure... 🤔

@mindplay-dk
Copy link
Contributor Author

Here's a simplified version:

https://github.com/mindplay-dk/sigma/blob/605780ab4ce6eb7d8bc335c62b7c2b7acdcd88c1/src/core/grammar.ts#L13-L29

export function grammar<T extends GrammarType>(init: GrammarInit<T>): Grammar<T> {
  const grammar = {} as { [key: string]: Parser<unknown> }

  for (const key in init) {
    grammar[key] = {
      parse() {
        throw new Error()
      }
    }
  }

  for (const key in init) {
    grammar[key].parse = init[key].apply(grammar).parse
  }

  return grammar as Grammar<T>
}

Just putting in "dummy" parsers for everything in the first loop, so that we get objects with a parse property that other objects can refer to, just to establish the circular references - then replace the parse property in a separate loop, "patching" the parsers.

Logically, this should be faster, because we're no longer doing anything dynamic - but it's quite possible that replacing a function like this breaks some sort of optimization in V8, I really can't say without a benchmark.

What's really suspicious, is if I swap the order of the benchmarks, so that grammar goes before defer rather than after, grammar consistently wins out by the same 10-15% margin - so there is something wrong with the benchmark, I think. 🤨

@mindplay-dk
Copy link
Contributor Author

For the record, opened an issue here to find out if this benchmark is still valid.

@mindplay-dk
Copy link
Contributor Author

So yeah, I have to conclude the benchmark package (used by benny) probably isn't reliable anymore.

caderek/benny#57 (comment)

Not sure how to address this.

Either way, having run the benchmarks in isolation, I'm now certain the grammar version of the JSON parser is in fact slightly faster than the defer version.

So I think it makes sense to add this feature?

If you agree, I'll try to get it wrapped up with tests and docs soon 🙂

@norskeld
Copy link
Owner

Yeah, benchmarking in Node/JS is super flaky, and benchmark.js seems to add headache in some cases. Not sure what we can do here (roll our own benchmarking library, perhaphs? 😁)

As for the feature itself, it does looks interesting and the implementation is kinda clever! Let's give it a go.

@norskeld norskeld added C-feature-accepted Category: A feature request that has been accepted and awaiting implementation and removed C-feature-request Category: A feature request, i.e. not implemented labels Aug 15, 2023
@norskeld norskeld assigned mindplay-dk and unassigned norskeld Aug 15, 2023
@mindplay-dk
Copy link
Contributor Author

Yeah, benchmarking in Node/JS is super flaky, and benchmark.js seems to add headache in some cases. Not sure what we can do here (roll our own benchmarking library, perhaphs? 😁)

Yeah, I was tempted already and start mucking about with some maths here:

https://github.com/mindplay-dk/sigma/blob/crazy-math/benchmarks/src/json/index.ts

There isn't really anything maths can do to fix this problem though - after all the testing I did over the last couple of days, I'm fairly confident there is some sort of V8/JS engine optimization that applies only to the first module that loads.

Not sure where I'm going with this - if I decide to do something about this, it would need to have process isolation somehow, which probably means it'll have limitations on what you can test (e.g. Workers don't expose the DOM) or it'll work on Node only. (forking or launching a dedicated process for each test.)

I might do another round at some point and scout for an existing benchmark library that works...

As for the feature itself, it does looks interesting and the implementation is kinda clever! Let's give it a go.

I will try to get this wrapped up soon. 🙂👍

@mindplay-dk
Copy link
Contributor Author

... aha! 🤔

@mindplay-dk
Copy link
Contributor Author

Nope, couldn't get that to work.

Llorx/iso-bench#1

We'll see if the author responds - it's a very new project, but it does sound really promising. 🙂

@Llorx
Copy link

Llorx commented Aug 16, 2023

Updated https://github.com/Llorx/iso-bench with an API easier to implement 🙂 Just commented on the issue there with your code @mindplay-dk working with the old API and also it working with the new API.

@mindplay-dk
Copy link
Contributor Author

Thanks @Llorx 🙂👍

@norskeld lets consider the benchmark issue as a separate issue from the grammar helper, yeah?

@norskeld
Copy link
Owner

Sure, okay.

@norskeld norskeld linked a pull request Aug 19, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-misc Area: Issues that don't fint into specific category C-feature-accepted Category: A feature request that has been accepted and awaiting implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants