You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
not really losing, but not showing it properly indeed.
the proper tokenization (using sep as separator when available and a space as default) could be implemented, but then I'm not sure if any other corpus will have sep attributes to make it worthwhile… how is tokenization described by other tokenizers? could touch.py produce something akin to sep? would it be useful to do so?
Losing the original text? Is it the right thing to do?
we do have the
sep
for produze the original text. Question is:Remember that default sep is space, so when a token doesn't have
sep
it is assumedsep=" "
. See confusing explanation in https://github.com/own-pt/glosstag/blob/princeton/dtd/glosstag.dtd#L158-L161 for the glosstag corpus !!The text was updated successfully, but these errors were encountered: