Skip to content

πŸ“š grammar in, πŸ”  string out.

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

Devin-Yeung/bnfgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

64 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Bnfgen

Build Status MIT Licensed

Bnfgen is a general purposed BNF grammar based random string generator βš™οΈ, with a powerful grammar extension to make it more πŸͺ‘ ergonomic to use.

Pitfall of BNF based generator

  • Generation of complex token is hard to maintain, e.g. numbers, variable
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" 
           | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" 
           | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" 
           | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;

This is a sample of how to represent a letter in BNF, it is obvious that the maintainability of this syntax is quite low. To address this, we integrate the regular express into our grammar extension seamlessly. Under our extension, the above syntax can simply be written as:

<letter> ::= re("[a-zA-Z]");
  • Unpredictable generation result

The generation of recursive rules in BNF is hard to control

<decls> ::= <decl> | <decl> <decls> ;

Without any control, the generation of <decls> may always choose the second branch, which lead to an extremely long generation result and even can't stop in limited time. Thus, we introduce invoke limit and weighted branch to control the generation.

<S> ::= <A> {1, }  // should be invoked at least once
      | <B> { 5 }  // should be invoked exactly 5 times
      | <C> {1, 5} // should be invoked at least once and at most 5 times

Noted it is possible that generator has nothing to choose:

<S> ::= <X> | <X> <S> {100};
<X> ::= "foo" {1, 5} | "bar" {10};

But, don't worry, the semantic analysis will help you out. We will give you a warning at the analysis stage.

Beyond the generation

The design of Bnfgen is heavily influenced by the Rust Programming Language, particularly its outstanding error messages. Bnfgen conducts various semantic analyses on the BNF grammar, aiming to make its error messages as informative as possible.

  Γ— May be trapped in a dead loop
   ╭─[5:13]
 4 β”‚        <term> ::= "Terminal" ;
 5 β”‚        <A> ::= <B> ;
   Β·        ──────┬──────
   Β·              ╰── this rule may be trapped in a dead loop
 6 β”‚        <B> ::= <C> ;
   Β·        ──────┬──────
   Β·              ╰── this rule may be trapped in a dead loop
 7 β”‚        <C> ::= <A> ;
   Β·        ──────┬──────
   Β·              ╰── this rule may be trapped in a dead loop
 8 β”‚
   ╰────

Currently, we support the following semantic analysis:

  • Invalid invoke limit range detection
  • Undefined rule detection
  • Duplicated rule detection
  • Unreachable rule detection
  • Dead loop detection (which avoid the possible infinite loop in the generation)
  • Invoke limit not enough detection

We believe that an informative error message is the key to make the tool more ergonomic to use.

Acknowledgement

  • Born of this project is highly inspired by Daniil Baturin's work, which is also a BNF based random string generator, but in OCaml 🐫. The design of the grammar extension is also heavily influenced by his work.

  • I want to thank Andrew Gallant for his work on the regex crate in Rust Eco-system. The incorporation of regular language can't be done so easily without the help of this crate.

  • I'd also like to express my gratitude to Maciej Hirsz and Niko Matsakis, their excellent work on the logos and lalrpop crate makes the parsing of the grammar file a breeze πŸƒ.

License

Licensed under either of

at your option.

About

πŸ“š grammar in, πŸ”  string out.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages