Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect start of line #165

Closed
Krever opened this issue Feb 4, 2021 · 5 comments
Closed

Detect start of line #165

Krever opened this issue Feb 4, 2021 · 5 comments

Comments

@Krever
Copy link

Krever commented Feb 4, 2021

I'm trying to parse a format where different parts are separated by <start_of_line>#### fragment and so, I would like to be able to detect the <start_of_line>.

IMHO the logic should be similar to P.start | <prev_char = '\n'>.

I'm not sure if that matters, but I'm trying to parse Intellij HTTP client file format with an explicit requirement of supporting ### in the first line, so for example

###
// A basic request
http://example.com/a/

###

// A second request using the GET method
http://example.com:8080/api/html/get?id=123&value=content
@Krever Krever changed the title Detect start for line Detect start of line Feb 4, 2021
@johnynek
Copy link
Collaborator

johnynek commented Feb 4, 2021

I would make a parser for a separator:

val sep = (P.start | P.char('\n')).soft *> P.string("###")

Then I would define your white space to be something like (!sep) *> P.charIn(" \t\r\n").rep

now. you can use repSep or something.

@Krever
Copy link
Author

Krever commented Feb 6, 2021

Yup, I'm using something like this right now, but the problem with such defined separator is that it consumes the \n char. I thought that not doing so would be a cleaner solution, so it works regardless of what is "before" it.

@Krever
Copy link
Author

Krever commented Feb 6, 2021

I implement few utils for my use case, but the problem is I had to do part of it in cats-parse itself - I couldn't find a way to access relevant internals from outside.

    case class PeekAt(n: Int) extends Parser0[Char] {
      if (n < 0) throw new IllegalArgumentException(s"required offset > 0, found $n")

      override def parseMut(state: State): Char = {
        if(n < state.str.length){
          state.str.charAt(n)
        } else {
          state.error = Chain.one(Expectation.FailWith(n, "Offset out of range"))
          '\u0000'
        }
      }
    }
  def prevChar(c: Char): Parser0[Char] = P.index
    .flatMap(idx => {
      if (idx == 0) P.failWith("Lookback at start of input") else P.peekAt(idx - 1)
    }).flatMap(cx => if (c == cx) P.pure(c) else P.failWith("Lookback failed"))

@johnynek
Copy link
Collaborator

johnynek commented Feb 6, 2021

I'm definitely open to having some mechanism to look back. The most general would be to access the whole String. But a more constrained would be to match the previous N characters with a given parser.

I tend to think accessing the String and then a convenience method to implement what you need (hopefully without flatMap which is generally more costly).

@johnynek
Copy link
Collaborator

Note, we added Parser.caret in #301 which allows you to access the line, column and total offset.

You can use that to implement line start with Parser.caret.filter(_.col == 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants