You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The example I have works well with regular expressions, but I wanted to explore how to achieve the same functionality using Lark.
In most of the grammars I've written with Lark (or Yacc), the goal is for the entire input to match the grammar. This approach works perfectly for parsing structured content like programming languages. However, I'm interested in parsing a large chunk of text where only specific parts are relevant, and the rest can be ignored or discarded as invalid.
I attempted the following:
grammar = r"""
start: (mul_instruction | INVALID)* // Parse the input as a series of instructions or invalid data
mul_instruction: "mul(" INT "," INT ")" -> mul
INT: /\d+/
INVALID: /.+?/
%import common.WS
%ignore WS
"""
My expectation was that unmatched portions of the text would be captured as INVALID. However, what actually happened was that the entire input was treated as INVALID, with characters being matched one at a time.
How can I design this in Lark so that it correctly identifies valid parts while gracefully handling and discarding invalid sections? Thank you!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The example I have works well with regular expressions, but I wanted to explore how to achieve the same functionality using Lark.
In most of the grammars I've written with Lark (or Yacc), the goal is for the entire input to match the grammar. This approach works perfectly for parsing structured content like programming languages. However, I'm interested in parsing a large chunk of text where only specific parts are relevant, and the rest can be ignored or discarded as invalid.
I attempted the following:
My expectation was that unmatched portions of the text would be captured as INVALID. However, what actually happened was that the entire input was treated as INVALID, with characters being matched one at a time.
How can I design this in Lark so that it correctly identifies valid parts while gracefully handling and discarding invalid sections? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions