-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeking prevents the use of streams with parse_incr
#100
Comments
@kylebgorman I'm happy to discuss other solutions to this. The reason it seeks (once, on the first line) is to find the global.columns comment that is included in CoNLL-U Plus. That sets the definition of which columns the current file has. For your use-case, what would be a better way to handle that? |
I wasn't familiar with that format so I just looked it up. Admitting I don't really have the full context and this might be wildly ignorant, I don't see how that requires backtracking. Without global columns the algorithm is something like: for line in source:
if is_metadata(line):
handle_metadata(line, metadata)
elif is_token(line):
handle_token(line, tokens)
elif is_blank(line):
yield TokenList(...) I'd just add another clause (with highest priority) to handle the case where something that looks like metadata but is in fact |
@kylebgorman I think I figured it out. Please try the latest version of conllu. It's seek free: https://pypi.org/project/conllu/6.0.0/ (did a major version dump because I removed a function in the public API) |
That works very well, thanks! |
One of the contexts I could easily imagine using
parse_incr
is on stdin or a process substitution-style file descriptor. E.g.:What these two types of input have in common is that they are both streaming and Python will crash if you attempt to
seek
on them, whichparse_incr
does indirectly here. I don't feel like I have the full context, but I think it's because it reads the sentence before it reads the metadata for whatever reason.This bit us in here. We will probably just move off of
conllu
and use our own custom solution which doesn't rewind, but my colleague thought it worth reporting to you here in case it's avoidable.The text was updated successfully, but these errors were encountered: