Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule templates? #30

Open
osa1 opened this issue Oct 29, 2021 · 1 comment
Open

Rule templates? #30

osa1 opened this issue Oct 29, 2021 · 1 comment
Labels
design feature New feature or request

Comments

@osa1
Copy link
Owner

osa1 commented Oct 29, 2021

Here are rules I'm using to lex Rust decimal, binary, octal, and hexadecimal numbers:

rule DecInt {
    $dec_digit,
    '_',

    $int_suffix | $ => |lexer| {
        let match_ = lexer.match_();
        lexer.switch_and_return(LexerRule::Init, Token::Lit(Lit::Int(match_)))
    },

    $whitespace => |lexer| {
        let match_ = lexer.match_();
        // TODO: Rust whitespace characters 1, 2, or 3 bytes long
        lexer.switch_and_return(
            LexerRule::Init,
            Token::Lit(Lit::Int(&match_[..match_.len() - match_.chars().last().unwrap().len_utf8()]))
        )
    },
}

rule BinInt {
    $bin_digit,
    '_',

    $int_suffix | $ => |lexer| {
        let match_ = lexer.match_();
        lexer.switch_and_return(LexerRule::Init, Token::Lit(Lit::Int(match_)))
    },

    $whitespace => |lexer| {
        let match_ = lexer.match_();
        // TODO: Rust whitespace characters 1, 2, or 3 bytes long
        lexer.switch_and_return(
            LexerRule::Init,
            Token::Lit(Lit::Int(&match_[..match_.len() - match_.chars().last().unwrap().len_utf8()]))
        )
    },
}

rule OctInt {
    $oct_digit,
    '_',

    $int_suffix | $ => |lexer| {
        let match_ = lexer.match_();
        lexer.switch_and_return(LexerRule::Init, Token::Lit(Lit::Int(match_)))
    },

    $whitespace => |lexer| {
        let match_ = lexer.match_();
        // TODO: Rust whitespace characters 1, 2, or 3 bytes long
        lexer.switch_and_return(
            LexerRule::Init,
            Token::Lit(Lit::Int(&match_[..match_.len() - match_.chars().last().unwrap().len_utf8()]))
        )
    },
}

rule HexInt {
    $hex_digit,
    '_',

    $int_suffix | $ => |lexer| {
        let match_ = lexer.match_();
        lexer.switch_and_return(LexerRule::Init, Token::Lit(Lit::Int(match_)))
    },

    $whitespace => |lexer| {
        let match_ = lexer.match_();
        // TODO: Rust whitespace characters 1, 2, or 3 bytes long
        lexer.switch_and_return(
            LexerRule::Init,
            Token::Lit(Lit::Int(&match_[..match_.len() - match_.chars().last().unwrap().len_utf8()]))
        )
    },
}

These rules are all the same, except the "digit" part: for binary numbers I'm using $bin_digit regex for the digits, for hex I'm using $hex_digit, and similar for other rules.

If we could implement "rule templates" that take regex as arguments, we could do have one template with a "digit" parameter, and pass $hex_digit, $oct_digit, etc. to it and avoid duplication.

@osa1 osa1 added feature New feature or request design labels Oct 29, 2021
@osa1
Copy link
Owner Author

osa1 commented Oct 30, 2021

Note that the rules above are not correct. For example, this won't be lexed correctly: [1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant