-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing locations #159
Comments
Historically, yojson was focused on speed and that's why the tree contains the data and nothing else. Getting a proper AST with locations and more is useful for some applications. Having this in yojson is up to the maintainers but it would be an addition that doesn't have much in common with the existing implementation. With some extra preprocessor hacking, we may be able to reuse the existing lexer which currently uses a hack to disable location tracking to improve performance. I'm not sure what others think. There's also the YAML path, which would consist in finding a YAML parser that returns a precise AST and has an option to restrict itself to JSON (modern YAML being a strict superset of JSON IIRC). I don't know much about the possibilities of ocaml-yaml but it may be worth looking into it. |
Thanks for the feedback @mjambon , indeed location tracking can decrease efficiently greatly, good point. I guess it makes sense to keep this issue open to see if a solution could be found, for now, for my own hacking purposes I derived a JSON parser from RWC that does indeed parse with locations, sharing it here in case could be useful to others: |
Hey, yes, having location can greatly help my cause as well. I will be happy to make a patch if the maintainers are receptive of it :) |
The problem is how to represent this in the type. Can't really change the AST without breaking every single user of Yojson, so a new AST would need to be created. In such case, it would be useful to revisit the structure of the AST as it is now anyway (see #158), to not have to do this in the future… this turns into a whole rabbit hole of issues so it's not particularly easy to tackle I would say without creating a lot of issues and questions as @mjambon mentions. |
Can't we reuse the existing lexer, Lines 2 to 23 in cf2f130
I don't know what all needs to be done after uncommenting those lines, and don't no whether conditional include or overriding is permitted in Ocaml so as to make this a switch. At least some sort of support, like to have line numbers while matched, would be cool |
You probably can, as the locations are in there so that's not the problem. The main issue is how to attach line numbers to the AST. You could use a similar approach as |
Having locations embedded directly within the AST would really be beneficial in scenarios where the JSON file originates directly from a user. In such cases, prioritizing parsing efficiency alone seems inefficient when compared to the enhanced utility that could be gained from parsing locations, allowing for more informative messages to be displayed to the user. What about creating new ASTs that store the locations directly in the nodes, more or less like that: type pos = Lexing.position * Lexing.position
type t = [
| `Null of pos
| `Int of int * pos
| `Assoc of (string * t * pos) list * pos
... ] Would it make sense? |
Such an AST would work (though I'd probably define the type as Given the amount of active maintainers in this project is very low and the demand is sporadic I have a hard time seeing this move forward anytime soon. |
I guess more than seeing the improvement as "small", it is more in the category of "critical for some users" vs "non important for others", for example if you need to parse user-provided JSON , locations are certainly a must if you need to provide feedback. Maybe indeed such parsing library would make sense as an independent project? I'm happy to fork the lib I wrote for coq-lsp and make it into a (Yo)json parser that is better suited for interactive use (error recovery, locations, etc...) |
Agreed, you do make a good point. It's probably better to think of it an inessential improvement for current users of Yojson that e.g. value API stability, interop and speed vs. an essential improvement for people who need the error reporting. The problem I see here is that extending the current parser to save locations would make parsing slower due to a lot more allocations and making that conditional would make the parser a good deal harder to maintain. Creating another parser solves this, but creates another issue of compliance - we probably don't want to have a parser with locations that parses data one way and a parser without locations that parses the same data another way. However, if someone would want to step up as a co-maintainer to make sure a parser with locations and an extended AST that is identical to, say |
Indeed! While I'm not expert on parsing, it seems to me that a parser geared towards user input is very different than a parser geared towards de-serialization. In particular locations are not enough, you also want to recover properly which makes the parser harder (tho easy to do with modern tools like the amazing menhir) So that would be a different project, maybe in a different repository. I also agree that testing for compliance would be important, but I think it should be easy to write a test driver that parses the test-cases twice and indeed check they are identical to the yojson reference parser. |
Dear yojson devs,
I'm reading some JSON data that I then analize, however, as to emit correct source locations for the analysis I'd need that the parsed JSON would contain locations on the original input file / buffer.
Is that something possible to do with yojson?
TIA!
The text was updated successfully, but these errors were encountered: