Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "final" modifier from CBORParser #129

Closed
jinzha opened this issue Mar 2, 2018 · 13 comments
Closed

Remove "final" modifier from CBORParser #129

jinzha opened this issue Mar 2, 2018 · 13 comments
Labels
Milestone

Comments

@jinzha
Copy link

jinzha commented Mar 2, 2018

Hi CBOR developers,

I would use the CBOR lib in our project, but want to extend it to add some additional functionality. It looks like CBORParser defines a bunch of protected members which implies it could be extended, but it is "final" class, so can not be extended.

Is it intended to not be extended? If not, can you please fix this by removing the final modifier?
BTW, the CBORGenerator is not final class.

@cowtowncoder
Copy link
Member

No, CBORParser is not meant to be extended, although there is no strict need to have it be final. I can remove that easily.

Out of curiosity, do you have specific use for sub-classing?

@cowtowncoder cowtowncoder added this to the 2.9.5 milestone Mar 2, 2018
@cowtowncoder cowtowncoder changed the title [CBOR] "final" modifer for CBORParser Remove "final" modifier from CBORParser Mar 2, 2018
cowtowncoder added a commit that referenced this issue Mar 2, 2018
@cowtowncoder
Copy link
Member

I remove final modifier from CBORParser -- I can not recall explicit reason to add it, and no other parser or generator implementation is left final (at an earlier point some were).
This is for 2.9 branch and master, so will be included in 2.9.5 release once that is made (and eventually 3.0).

@jinzha
Copy link
Author

jinzha commented Mar 2, 2018

Hi cowtowncoder,

Thanks for quickly fixing it!

In our case, we would add a method to make CBORParser read from the specified offset of InputStream, the underling InputStream in our project support to read from a specified offset. I tried add below method to support this. Does it make sense to you?

public class CBORParserEx extends CBORParser {
public void setOffset(int offset) {
final ByteInputStream in = getInputStream();
in.setOffset(offset);
_currInputProcessed = offset;
_inputPtr = _inputEnd = 0;
}
}

@jinzha
Copy link
Author

jinzha commented Mar 2, 2018 via email

@cowtowncoder
Copy link
Member

On modification: I would personally modify it by custom InputStream instead, and not try to do that on parser. Or, if you prefer, sub-classing CBORFactory and handling it here.
The only thing that would not achieve would be _currInputProcessed, and if that is important, the only way would be sub-classing.

As to 2.9.5, likely some time in March, maybe in 2 weeks or so. Usually there's at least 2 months between patch releases at this point.

@jinzha
Copy link
Author

jinzha commented Mar 4, 2018 via email

@cowtowncoder
Copy link
Member

Use of input buffer is pretty essential for reasonable performance -- overhead of single byte reads would be significant. So design does not allow skipping of buffering.
It is possible to "flush" content buffered but not (yet) read at end of decoding (using method releaseBuffered() on parser).

But I guess I am not sure which aspect of buffering would be problematic for you: location, or something else?

@jinzha
Copy link
Author

jinzha commented Mar 7, 2018 via email

@jinzha
Copy link
Author

jinzha commented Mar 7, 2018 via email

@cowtowncoder
Copy link
Member

In general intent is not to modify underlying InputStream, but to construct separate CBORParser instances for distinct input documents. This is safer and overhead for parser instances should be modest.

But your use case where random-access is required is quite different from kinds of use cases I have considered. I can see how parser, as is, would have significant overhead.

If you do know the length of document (or document fragment) to parse you can prevent InputStream from exposing more content than is needed: so, for example, if document is known to be 400 bytes long, only return up 400 bytes. That would avoid significant overhead of reading 8000 bytes, discarding 7600.

@jinzha
Copy link
Author

jinzha commented Mar 7, 2018 via email

@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 7, 2018

@jinzha Yes, it sounds like this is the case unfortunately. Optimizing for such access is tricky -- although something like this is actually implemented in Jackson JSON parser for DataInput. It results in -40% performance decrease (which may be more than what would happen with InputStream; perhaps DataInput access pattern is more difficult for JVM to optimize), but does work with no look-ahead/read-ahead (since DataInput does not have means to detect end of input; reader must know how much is still needed).
Tests are in https://github.com/FasterXML/jackson-benchmarks.

Similarly, async parsers (for JSON, Smile; perhaps in future for CBOR) use different access pattern as input is fed, not read via blocking stream.

So it would be possible to implement no-caching parser; and make that pluggable via CBORFactory. I don't have time to do that myself, but if anyone wants to try that in future, I can help with integration.

For now your best bet may be to something else that exists, esp. if you have your own decoder.

@jinzha
Copy link
Author

jinzha commented Mar 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants