-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that for "usual" regex behavior multiline
is required
#232
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,8 +202,11 @@ match p = do | |
-- | error message. | ||
-- | | ||
-- | The returned parser will try to match the regular expression pattern once, | ||
-- | starting at the current parser position. On success, it will return | ||
-- | the matched substring. | ||
-- | starting at the current parser position. Note that this implies that an | ||
-- | expression starting as `^…` (i.e. with the beginning of line) will match the | ||
-- | current position even in absence of a preceding newline. | ||
-- | | ||
-- | On success, the parser will return the matched substring. | ||
-- | | ||
-- | If the RegExp `String` is constant then we can assume that compilation will | ||
-- | always succeed and `unsafeCrashWith` if it doesn’t. If we dynamically | ||
|
@@ -231,14 +234,24 @@ match p = do | |
-- | | ||
-- | #### Example | ||
-- | | ||
-- | This example shows how to compile and run the `xMany` parser which will | ||
-- | capture the regular expression pattern `x*`. | ||
-- | Compiling and running different regex parsers: | ||
-- | | ||
-- | ```purescript | ||
-- | case regex "x*" noFlags of | ||
-- | Left compileError -> unsafeCrashWith $ "xMany failed to compile: " <> compileError | ||
-- | Right xMany -> runParser "xxxZ" do | ||
-- | xMany | ||
-- | example re flags text = | ||
-- | case regex re flags of | ||
-- | Left compileError -> unsafeCrashWith $ "xMany failed to compile: " <> compileError | ||
-- | Right xMany -> runParser text do | ||
-- | xMany | ||
-- | | ||
-- | -- Capturing a string per `x*` regex. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For separate examples I think would should have separate markdown code blocks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be a lot of duplicate code for a reader to dig through? |
||
-- | exampleXMany = example "x*" noFlags "xxxZ" | ||
-- | | ||
-- | -- Capturing everything till end of line. | ||
-- | exampleCharsTillEol = example ".*$" multiline "line1\nline2" | ||
-- | | ||
-- | -- Capturing everything till end of string. Note the distinction with | ||
-- | -- `exampleCharsTillEol`. | ||
-- | exampleCharsTillEos = example ".*$" dotAll "line1\nline2" | ||
-- | ``` | ||
-- | | ||
-- | #### Flags | ||
|
@@ -249,9 +262,11 @@ match p = do | |
-- | regex "x*" (dotAll <> ignoreCase) | ||
-- | ``` | ||
-- | | ||
-- | The `dotAll`, `unicode`, and `ignoreCase` flags might make sense for | ||
-- | a `regex` parser. The other flags will | ||
-- | probably cause surprising behavior and you should avoid them. | ||
-- | The `dotAll`, `multiline`, `unicode`, and `ignoreCase` flags might make | ||
-- | sense for a `regex` parser. In fact, per JS RegExp semantics matching a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that before we encourage people to use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, that makes whole PR a moot point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
to add to that: besides the explanation referred, there's also a point that explicitly matching newline characters is often frowned upon. This is because there exist at least 3 different styles of newlines. So the usual advice is not to match a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think personally I just view this parser as serving a different purpose than for what enabling It's already a compromised regex interface too - you can't really use it the way you normally would since the groups in the pattern are inaccessible. (I'm not saying I'm right in this opinion, just giving context for why I wouldn't miss There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW, the current documentation encourages to prefer
Fair, although till now I haven't even noticed lack of groups, because the "combinators style" implies that when there's a pattern one wants to retrieve, you'd just consume it separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do we even understand how it works though? I checked @garyb ’s case and confirmed that runParser "some\nvarious\nlines" do
m <- fromRight' unsafeCoerce $ regex "various$" multiline
r <- rest
pure $ Tuple m r gives the result (Right (Tuple "various" "rious\nlines")) which is surprising and wrong. So I think that the current advice in the docs is pretty good advice until we decide what kind of behavior the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for calling attention to this, though. It would be nice if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's just plain broken with It is fixable if we want to support it though, we'd have to do something like perform a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made an issue #233 |
||
-- | single line boundary in a multiline string requires passing `multiline`. | ||
-- | | ||
-- | Other flags will probably cause surprising behavior and should be avoided. | ||
-- | | ||
-- | [*MDN Advanced searching with flags*](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#advanced_searching_with_flags) | ||
regex :: forall m. String -> RegexFlags -> Either String (ParserT String m String) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first example should show the simplest basic usage. Adding this
example
helper function is an extra indirection.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the code has always been there, I just moved it to a separate function. I can move the function after example if you want.