You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm writing a lexer and a parser for a custom language. I would like to propagate the line numbers of each token from the lexing stage (should it succeed) to the parsing stage, so that parsing errors can be more friendly.
The problem is, parsing tokens composed with SourcePos seems to be significantly harder.
I can't instruct token what's expected (of type MyTokenType), because it wants a MyToken (the input type of my parser) which is the wrapped type with the line number. It doesn't make sense to include the number in the expected token.
Even if I were to use [TokenType] (the type without line number) as the input stream directly, other variants of the same issue crops up. For example: if I were to parse a sequence of TInt Int tokens, it doesn't make sense to say TInt 123 is expected, because any token of type TInt with any value that the lexer produces is valid.
Is there an easy way out of this, or is there a better, more idiomatic alternative ?
Lexer:
-- Wrapped token type with position in the source filedataMyToken=MyToken{tokenPos::SourcePos, tokenType::MyTokenType}deriving (Eq, Ord, Show)
dataMyTokenType=TIntIntderiving (Eq, Ord, Show)
typeLexer=ParsecVoidString-- Wrap with positionwithPos::LexerMyTokenType->LexerMyToken
withPos p =MyToken<$> getSourcePos <*> p
spaceConsumer::Lexer()
spaceConsumer = (skipMany . oneOf) ['\t', '']
lexer::Lexer [MyToken]
lexer =do
between spaceConsumer (spaceConsumer <* eof) (many (lInt <* spaceConsumer))
where
lInt = withPos $TInt.read<$> takeWhile1P Nothing isDigit
Parser:
typeParser=ParsecVoid [MyToken]
-- Primitive over `token` to parse a lexical item
token'
:: (MyTokenType->Maybea)
--^ Predicate->MyTokenType--^ Expected->Parsera
token' p t =
token
(p . tokenType)
(S.singleton .Tokens.NE.singleton $ t {- type error, expected `Token` but got `TokenType` -})
Thank you for maintaining the library, megaparsec is awesome :)
The text was updated successfully, but these errors were encountered:
I found this gist, which shows that one can hack the VisualStream to make MyToken not show the embedded line information, and hack the TraversableStream to make line number aware of the token's actual position in the source file.
Though my original question of "how to instruct token what's expected" still remains. Unless it doesn't matter, since it won't be shown to the user, so we can create a dummy position?
Hello,
I'm writing a lexer and a parser for a custom language. I would like to propagate the line numbers of each token from the lexing stage (should it succeed) to the parsing stage, so that parsing errors can be more friendly.
The problem is, parsing tokens composed with
SourcePos
seems to be significantly harder.I can't instruct
token
what's expected (of typeMyTokenType
), because it wants aMyToken
(the input type of my parser) which is the wrapped type with the line number. It doesn't make sense to include the number in the expected token.Even if I were to use
[TokenType]
(the type without line number) as the input stream directly, other variants of the same issue crops up. For example: if I were to parse a sequence ofTInt Int
tokens, it doesn't make sense to sayTInt 123
is expected, because any token of typeTInt
with any value that the lexer produces is valid.Is there an easy way out of this, or is there a better, more idiomatic alternative ?
Lexer:
Parser:
Thank you for maintaining the library, megaparsec is awesome :)
The text was updated successfully, but these errors were encountered: