-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose byte range(s) for attributes? #113
Comments
The reason there are no attribute ranges is memory overhead. We basically have to add 8 bytes to each attribute, which would significantly increase memory usage. Same with attribute value ranges. I don't mind replacing an attribute position with a range. But adding name and value ranges would require additional 24 bytes. Same with more granular positions. |
Thanks for your response! Even just replacing an attribute's position with a range would be a big boost for me, so I'm very happy to hear that you don't mind doing that. It's been surprisingly challenging for me to find an XML library that's both relatively feature complete and also provides good range information (or even a way for me to cobble it together myself). I'm pretty sure that having an Regarding more granular ranges like separate attribute names and values, I definitely understand your concern about memory usage and realize that most users probably wouldn't want to make that trade-off. It's a shame for people like me that would be willing to pay the memory cost, though. What do you think about potentially using a feature flag to enable extra range tracking? I realize that would add some complexity to the crate, and maybe more than you'd be comfortable with, but I wonder if it might not be too bad? I have not dived into the code myself to look, but assuming that the underlying parser (
That's obviously not trivial, but it also doesn't seem especially complex at first glance either (though again, I acknowledge that I haven't checked out the code myself, and that XML always seems to be more complicated than you'd think). Any chance that it's at least worth consideration? Maybe the same idea could even be used to save memory by placing the range data that already exists behind a feature flag as well (assuming that it isn't required by crate internals); such a flag could default to enabled (for backward compatibility) but people who don't even care about the basic ranges could save memory just by turning the flag off. Anyway, thanks again for your reply, and I'm excited about the possibility of an |
Created PR #119 for adding I also have a branch for adding subranges for an attribute's qname and value behind a new feature flag, and would be happy for any feedback. Let me know if I missed any protocol or such. Despite being a relatively experienced dev, I'm pretty new to contributing on GitHub like this. E.g. I couldn't figure out how to link the PR to this issue, I guess only this repo's maintainers can do that? |
Thanks. Looks good to me. As for qname/value ranges, you have an interesting trick to reduce memory usage. I like it.
I don't think you can "link" it. You can mention an issue by id/number in a PR and GitHub will auto-reference it. That's about it. |
Good point that Also -- will your automated build/test workflow on GitHub need to be manually updated to build/test with the new feature enabled? If so, I'm guessing that I won't be able to do that for you as part of the PR? Thanks for your time and feedback, and accepting these PRs. I'm happy to be able to have this functionality in my app without needing my own fork. |
I'm evaluating XML libraries for use on a project, and one thing that I'm looking for is the ability to get the byte ranges where various XML items came from in the original string/document, for use in error reporting. I found
Node.range()
, which is exactly what I'm looking for. However, attributes aren'tNode
s so they don't benefit from this API. I foundAttribute.position()
, but it only provides the first byte rather than a range. Would you consider adding arange()
method toAttribute
? Or does something like that already exist and I missed it?Even better -- would you consider adding separate ranges for an attribute's name and value? It would be nice to be able to pinpoint one or the other in different kinds of error messages. E.g. an "invalid attribute" error could pinpoint the name, whereas an "invalid value" error could pinpoint the value. I don't personally care very much whether the name's range includes its prefix (if any) or not, but I realize that's a thing that would need to be considered.
Another library that I've looked at is your
xmlparser
, which is far and away the best I've seen in terms of pinpointing the ranges of various items, not just separate prefix/name/value ranges for attributes, but even separate name/prefix for elements, separate full/contents-only ranges for comments and CDATA, and so forth. Though as you point out in its documentation, that library is very low-level and therefore very limited in actual XML processing and validation capabilities. I noticed that you recently went so far as to deprecatexmlparser
in favor of simply merging it into thisroxmltree
library. For anyone that has been usingxmlparser
and the extra range information it provides, they might find it easier to transition to this library if it also made that extra information available. The main hang up for me is attribute ranges, but I would also applaud exposing subranges of various types ofNode
s as well.The text was updated successfully, but these errors were encountered: