-
Hi, I try to parse C code for extracting some information.
Here is the parser code:
The error is: *error at (line 7, column 5): unexpected "/", expecting space or open bracket" |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
The main thing to note is that all parsers have one of four result states:
Imagine this: var p = either(str("Hello"), str("Hi")); If you try to use the parser And so you need var p = either(attempt(str("Hello")), str("Hi")); This will make sure that any consumption is undone when starting subsequent cases. This is also true when using many(attempt(p));
many1(attempt(p)); I notice you don't have any Another good technique when parsing languages is to build a token parser: public static Parser<A> token<A>(Parser<A> p) =>
from r in p
from _ in spaces // you need to build a spaces and comments parser to use her
select r; Then you can wrap any other parser with For example, you could build a symbol parser: public static Parser<string> symbol(string sym) =>
token(str(sym)); Because building language parsers is quite hard to get right, especially when it comes to things like string-literals (with escape codes), floating-point numbers, nested multi-line comments, etc. There's built-in functionality for producing most of the parsers you will need: using static LanguageExt.Parsec.Token;
var reservedNames = List("void", "char", "short", "int", "long", "float",
"double", "signed", "unsigned", "struct", "union", "const", "volatile",
"auto", "register", "extern", "typedef",
"goto", "continue", "break", "return",
"while", "do", "for",
"if", "switch", "case", "default");
// Use the built-in language definition for Java
// Modify the defaults using .With
var def = Language.JavaStyle.With(ReservedNames: reservedNames);
// Build common parsers from the language definition
lexer = makeTokenParser(def) ?? throw new InvalidOperationException();
static Parser<string> ident => lexer.Identifier;
static Parser<string> dot => lexer.Dot;
static Parser<string> comma => lexer.Comma;
static Parser<string> colon => lexer.Colon;
static Parser<string> op => lexer.Operator;
static Parser<string> stringLiteral => lexer.StringLiteral;
static Parser<char> charLiteral => lexer.CharLiteral;
static Parser<int> natLiteral => lexer.Natural;
static Parser<int> intLiteral => lexer.Decimal;
static Parser<Unit> whiteSpace => lexer.WhiteSpace; // This will parse comments too
static Parser<Either<int, double>> natOrFloatLiteral => lexer.NaturalOrFloat;
static Parser<double> floatLiteral => lexer.Float;
static Parser<string> reserved(string ident) => lexer.Reserved(ident);
static Parser<string> symbol(string sym) => lexer.Symbol(sym);
static Parser<A> token<A>(Parser<A> p) => lexer.Lexeme(p);
static Parser<A> parens<A>(Parser<A> p) => lexer.Parens(p);
static Parser<A> braces<A>(Parser<A> p) => lexer.Braces(p);
static Parser<A> brackets<A>(Parser<A> p) => lexer.Brackets(p);
static Parser<A> angles<A>(Parser<A> p) => lexer.Angles(p);
static Parser<Seq<A>> semiSep<A>(Parser<A> p) => lexer.SemiSep(p);
static Parser<Seq<A>> semiSep1<A>(Parser<A> p) => lexer.SemiSep1(p);
static Parser<Seq<A>> commaSep<A>(Parser<A> p) => lexer.CommaSep(p);
static Parser<Seq<A>> commaSep1<A>(Parser<A> p) => lexer.CommaSep1(p);
static Parser<Seq<A>> bracketsCommaSep<A>(Parser<A> p) => lexer.BracketsCommaSep(p);
static Parser<Seq<A>> bracketsCommaSep1<A>(Parser<A> p) => lexer.BracketsCommaSep1(p);
static Parser<Seq<A>> parensCommaSep<A>(Parser<A> p) => lexer.ParensCommaSep(p);
static Parser<Seq<A>> parensCommaSep1<A>(Parser<A> p) => lexer.ParensCommaSep1(p);
static Parser<Seq<A>> anglesCommaSep<A>(Parser<A> p) => lexer.AnglesCommaSep(p);
static Parser<Seq<A>> anglesCommaSep1<A>(Parser<A> p) => lexer.AnglesCommaSep1(p);
static Parser<Seq<A>> bracesCommaSep<A>(Parser<A> p) => lexer.BracesCommaSep(p);
static Parser<Seq<A>> bracesCommaSep1<A>(Parser<A> p) => lexer.BracesCommaSep1(p);
static Parser<Seq<A>> bracketsSemiSep<A>(Parser<A> p) => lexer.BracketsSemiSep(p);
static Parser<Seq<A>> bracketsSemiSep1<A>(Parser<A> p) => lexer.BracketsSemiSep1(p);
static Parser<Seq<A>> parensSemiSep<A>(Parser<A> p) => lexer.ParensSemiSep(p);
static Parser<Seq<A>> parensSemiSep1<A>(Parser<A> p) => lexer.ParensSemiSep1(p);
static Parser<Seq<A>> anglesSemiSep<A>(Parser<A> p) => lexer.AnglesSemiSep(p);
static Parser<Seq<A>> anglesSemiSep1<A>(Parser<A> p) => lexer.AnglesSemiSep1(p);
static Parser<Seq<A>> bracesSemiSep<A>(Parser<A> p) => lexer.BracesSemiSep(p);
static Parser<Seq<A>> bracesSemiSep1<A>(Parser<A> p) => lexer.BracesSemiSep1(p); One final tip is to know where the end of the stream is, and make sure it's there. So for your from _1 in whiteSpace
from rs in InterfaceIdList
from _2 in eof
select rs; That will strip the spaces and comments at the start, parse the tokens (which automatically strip the comments and spaces after each token), and so we expect the end-of-stream after that. It should be there, or your parse has failed. One other thing to look at is the BNF definition for C. That will give you a good insight into how to build your parsers (if you're looking to expand out to a more general C parser). |
Beta Was this translation helpful? Give feedback.
The main thing to note is that all parsers have one of four result states:
Imagine this:
If you try to use the parser
p
with the string"Hi"
, it won't succeed, even though it's given as a valid option. It's because when trying the first parserstr("Hello")
, it would get to the'e'
and fail. This is a Consumed Failure. The parsec library won't automatically rewind to…