-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prefix of the part-end
delimiter to the part
#387
Add prefix of the part-end
delimiter to the part
#387
Conversation
Another question I'm asking myself is whether this is the right way to fix this issue. It seems that enforcing delimiters sit on their own line is sensible. Maybe we should start a conversation with the ocamlformat team to try to see how we could come to an agreement on this. If there is nothing to be done on their side we can revert to allow inline part, or at least end delimiters I guess. The reason I'd no be too kin to support it is that I think it can lead to pretty ambiguous cases, especially if in the example from the linked issue, the end delimiter was directly followed by a begin one. Inline begin delimiters can be tedious to deal with, especially regarding indentation. |
I agree that the current implementation is not great, but I think it is good as a starting point for discussion. I think we agree that the current way, where it silently drops the last line is bad - even a version where it would error out would be better because it wouldn't lead to potentially subtle errors where the inlined code is simply missing lines but it is only noticed way too late. Throwing an error in that case would be pretty simple and require fewer changes than what's in this branch already. As a matter of fact, we weren't aware of the issue until the bugreport either, so it shows how fragile the current baheviour is. Making OCamlformat not put the delimiters in-line and emitting an error message if people do that by hand could be a decent compromise, because as you say, the semantics of in-line delimiters are a bit complicated. If we do decide to go forward with allowing inline end-delimiters, I'll probably drop |
Oh no don't get me wrong, there's no particular problem with the implementation. I'm merely wondering what the best fix would be here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I have a few minor comments but hen we should be good. I think some follow up refactoring of Parts
could help improve the code overall but we can do this a bit later, in particular, we could do it once we drop the attributes based delimiters support!
lib/part.ml
Outdated
let msg = Printf.sprintf "File ended before part %s." p in | ||
Error (msg, nline) | ||
| File_end, None -> Ok [ make_part ~is_begin_end_part:true part_lines ] | ||
match input_line_err input with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I find the new implementation, with nested pattern matching, a bit harder to follow.
Maybe we could extract a separate, mutually recursive function to handle the Ok _
case.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit easier to follow because it really had two concerns:
- EOF handling
- Parsing
File_end
was not a result of parsing, it is just a signal to the parser to stop. But admittedly, the function is quite complex and nested (and it took me a while to understand), so you correctly point out that that it still conflates parsing and IO. It's a good point, let me try to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote parse_parts
further:
- Instead of reading from the channel, I read into a list, thus the pattern match on the
End_of_file
marker became a pattern match on[]
, that is, the end of the input. This of course means it still has to deal with these two cases. - Then I realized that the function doesn't need to be recursive. What was making the function hard to read for me was the recursive calls with
>>|
to cons things to the list. I noted thatfold_left
can do the same thing:- Supply an
element
to work on and terminate when done - Deal with the state in the accumulator
Thus thus the function could be split into "deal with the inputs" viafold_left
and "deal with the tail" based on the result of the accumulator. Given our minimum version is 4.08,let*
helps to unwrap the arguments.
- Supply an
- Finally, given I changed from a version that lazily reads out of an
in_channel
to a version that eagerly reads the input into alist
, I thought I can put the laziness back in by switching fromList.fold_left
toSeq.fold_left
which was indeed a drop-in replacement.
I believe the resulting function is a bit easier to understand given each piece of code only deals with one aspect and the fold is a bit simpler to follow through than the recursion. Fortunately the tests are pretty comprehensive so I immediately got feedback when some part of the logic was implemented incorrectly.
I've been iterating on and off on this, thinking of smaller improvements that lead to other improvements, that again lead to other refactorings. As such the removal of the I've reworked the test a bit: the test cases of I think this is overall an improvement, because the tests tested very similar functionality, if the |
This is a test case for #374 to show that there is an issue with parts of code missing.
While there could be a case for modelling it this way, it makes the code tricker to understand and `File_end` is, after all, not a valid parsing result either way. Having the parsing separated makes the function a bit more nested but it becomes easier to understand.
Having them separately makes no sense since one is just a reformulation of the other.
(Error | ||
(`Msg | ||
"'part-end' delimiter does not accept a value. Please write '(* \ | ||
$MDX part-end *)' instead.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important but since this part of the API is now internal, wouldn't it better to test it with ppx_expect instead ? So we can not export this module anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's well possible, but it would introduce an additional dependency since we don't depend on ppx_expect
at the moment. While this could certainly be done in the future, by refactoring (all the relevant) tests, I wouldn't add it to this PR because that would be scope-creep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is a clear improvement indeed. Maybe in the future it would be neat to support multiline comments such as
(* $MDX
part-end *)
I tend to prefer ocamllex over regexps so I'm a bit biased. Maybe not having to maintain the token flow line-by-line would make it easier to parse as well? But mdx is already designed this way in its current state so this can wait.
I would generally agree on ocamllex. However I kept the parsers as regexp for now since one of them is deprecated (and we should try to remove it for MDX 3, #396), at which point it should be easier to switch the other one to |
CHANGES: #### Added - Report all parsing errors in Markdown files (realworldocaml/mdx#389, @NathanReb) #### Changed - Preserve indentation in multiline OCaml blocks in .mli files (realworldocaml/mdx#395, @panglesd) #### Fixed - Fixed compatibility with Cmdliner 1.1.0 (realworldocaml/mdx#371, @Leonidas-from-XIV) - Report errors and exit codes of toplevel directives (realworldocaml/mdx#382, @talex5, @Leonidas-from-XIV) - Fix block locations in error reporting (realworldocaml/mdx#389, @NathanReb) - Include the content of the line that features the `part-end` MDX directive in the output, before that line would've been dropped (realworldocaml/mdx#374, realworldocaml/mdx#387, @Leonidas-from-XIV) - Handle EINTR signal on waitpid call by restarting the syscall. (realworldocaml/mdx#409, @tmcgilchrist) - Fix parsing of multiline toplevel phrases in .mli files (realworldocaml/mdx#394, realworldocaml/mdx#397, @Leonidas-from-XIV) #### Removed - Removed warning about missing semicolons added in MDX 1.11.0 and the automatic insertion of semicolons in the corrected files introduced in MDX 2.0.0. (realworldocaml/mdx#398, @Leonidas-from-XIV)
#374 reports that the last line gets cut off when using
(* $MDX part-end *)
. This is generally true becausePart.Parse_parts.parse_line
can only parse a line into aPart_end
or aNormal
, but a line with an end-marker is sort of both.This PR adds the prefix of the line to the
Part_end
constructor so if there is any code that is not whitespace, it will get added to the constructor from where the code can attach the missing line to thepart
.I've been thinking about the design of this solution because attaching the value to the
Part_end
constructor feels wrong sinceOcaml_delimiter
only deals with delimiters and not the payload of the block and this is changing it. I feel that maybe the parsing ought to be done differently and signal that there is something else on the line that still needs to be processed so it would get processed into something more like[Normal prefix; Part_end]
.As such, I welcome other suggestions how to solve this in a "nice" way, without breaking the separation between the parsing and the delimiters in ugly ways.
Fixes #374.