How to match and parse some parts of the input #1495

ashish01 · 2024-12-04T22:30:49Z

ashish01
Dec 4, 2024

The example I have works well with regular expressions, but I wanted to explore how to achieve the same functionality using Lark.

In most of the grammars I've written with Lark (or Yacc), the goal is for the entire input to match the grammar. This approach works perfectly for parsing structured content like programming languages. However, I'm interested in parsing a large chunk of text where only specific parts are relevant, and the rest can be ignored or discarded as invalid.

I attempted the following:

grammar = r"""
    start: (mul_instruction | INVALID)*  // Parse the input as a series of instructions or invalid data
    mul_instruction: "mul(" INT "," INT ")" -> mul

    INT: /\d+/
    INVALID: /.+?/

    %import common.WS
    %ignore WS
"""

My expectation was that unmatched portions of the text would be captured as INVALID. However, what actually happened was that the entire input was treated as INVALID, with characters being matched one at a time.

How can I design this in Lark so that it correctly identifies valid parts while gracefully handling and discarding invalid sections? Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to match and parse some parts of the input #1495

{{title}}

Replies: 0 comments

Select a reply

How to match and parse some parts of the input #1495

ashish01 Dec 4, 2024

Replies: 0 comments

ashish01
Dec 4, 2024