-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiline flag doesn't work #233
Comments
More comments from @garyb #232 (comment)
|
It would be nice to have the following parsers in -- | Zero-width parser succeeds at beginning of input or following `\n` or `\r\n`.
lineBegin :: forall m. ParserT String m Unit -- | Zero-width parser succeeds at `eof` or before `\n` or `\r\n`.
lineEnd :: forall m. ParsetT String m Unit |
My opinion on disallowing
And one more thought: when On the side of allowing it, I guess it's harmless enough since it's something you have to opt into - it won't affect me whether we allow and fix it or not, because I'm unlikely to use it either way. 😄 I'd probably go further and suggest we might need two regex functions, a "simple" one which is basically what we have now, where it returns the whole match, and we'd disallow |
Since all
|
I understand your point here, but for my background it's not true, so you can't claim it universally. 😄 |
@Hi-Angel how do you feel about my suggestion of the two different regex parsers? |
Out of curiosity, where do you usually use regular expressions and how? I struggle to come up with a prolonged workflow that would result in a different expectation. Text editors, terminal utilities, most programming languages, office suites… they all default to what JS calls a "multiline behavior".
Well, renaming This still leaves the problem that Hence, I'm not really clear what such change would solve. If you know that you don't have motivation or time to fix |
I think in that case it's still worth mentioning somewhere at the beginning that the default behavior of UPD: or maybe even better: adding an example with |
-- | Matches `\n` or `\r\n`.
newline :: forall m. ParserT String m String
newline = string "\n" <|> string "\r\n" This |
I mostly agree with @garyb, but I'd maybe go a bit further:
|
And while I sympathize with the fact that JS regular expression semantics can be unintuitive, I don't think it's really the job of purescript-parsing to educate users about those semantics, or try to paper over those semantics in any way. It's FFI. It is what it is, and the note about undefined behavior should suffice. If anything, I think this is good evidence that baking in regular expression support (in any library) is problematic, and should really go in something like a |
I think I just don't run into many situations where I need to match line boundaries. I think the last time I did was probably a year or two ago. 😄 And the behaviour doesn't surprise me in here because JavaScript was my first programming language over 20 years ago and I've worked with it on and off ever since. 😉
That's not my point at all, the fix is trivial (see my note above that James copied over from the PR). It's more about the expectations of what I think either way it's going to break stuff. If we disallow |
Most combinators in the library are replaceable with a regex. In fact, in Emacs most parsers (except tree-sitter which Idk if it's using regexes) are indirectly or directly using regexes. Emacs has its own combinators library with stuff like
That would've been fair if the library was just an FFI to a JS library, but it isn't. I don't see how FFI would've affected any of the design decisions. |
I'm afraid the reason is different. Granted, I'm not a web developer (wasn't at least), so correct me if I'm wrong, but parsing is an exceptionally rare thing for a frontend to do. Instead it's usually done on backend. Now, before starting my fullstack project at work I've searched if someone written serious fullstack app in PureScript and IIRC I only found a few pet project stories. So I kind of took risk attempting that. This point alone limits your auditory. Furthermore, after digging into And the And note that this basic question "how do I grep with Furthermore, I've got a lot of runtime problems using TL;DR of this wall is just that I doubt the library has enough users for an assumption "but nobody complained for 2.5 years" to work. |
The
My point was the inverse. Parser combinators subsume regular expressions. You can implement all regular expression semantics in purescript-parsing without using the existing |
Well, one basic parser that I haven't found in the library and had to implement with regexes (maybe I wasn't looking hard enough, but I just skimmed over all String-related parts and still don't see it) is a "word". A To implement it I'd need a "word-boundary" parser, which I also do not see in the library. |
To be clear, I don't mean to suggest that purescript-parsing exposes all primitive combinators to replace regular expression semantics. Then we would already have a non-RegExp-based regular expression library! I'm just saying that they can all be implemented since all parsers do is look at the string and consume something (or not). Again, I think it would be great to have combinators for all common regular expression tokens. |
You're right. And on the second thought, a "word" can be implemented with |
Originally posted by @garyb in #232 (comment)
Should we forbid
multiline
? Should we make themultiline
flag work properly?The text was updated successfully, but these errors were encountered: