Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pos tagging for imperative sentence is inconsistent #139

Open
moskaliukua opened this issue Aug 7, 2024 · 2 comments
Open

Pos tagging for imperative sentence is inconsistent #139

moskaliukua opened this issue Aug 7, 2024 · 2 comments

Comments

@moskaliukua
Copy link

moskaliukua commented Aug 7, 2024

Hi,
I ran into a corner case with pos tagging for imperative sentences like:
Suppose I tell you that it is true.
if run this sentence on its own then it works as expected

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

if run it with text that contains one sentence before
it changes pos of suppose to pnoun

nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
I 0 I I X 2 0 word i / PRON
watch 1 wa tch xxxx 1 0 word watch / VERB
TV 1 TV TV XX 2 0 word tv / NOUN
every 1 ev ery xxxx 1 0 word every / DET
day 1 da day xxx 1 0 word day / NOUN
. 0 . . . 0 0 punctuat . / PUNCT

total number of tokens: 6

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:

 nlp.readDoc('I like playing football').printTokens();
 nlp.readDoc('Suppose I tell you that it is true.').printTokens();

produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB

can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?

versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",

@rachnachakraborty
Copy link
Member

Hi @moskaliukua

We appreciate your time and effort in elaborating the inconsistency in Pos-tagging.

We could replicate the issue. This needs a deeper dive at our end.

Shall revert on this shortly.

Thank you once again.

Best,
Rachna

@moskaliukua
Copy link
Author

Hi, I have an update regarding the issue, I found where the issue is located,
it's in wink-eng-lite-web-model repo, https://github.com/winkjs/wink-eng-lite-web-model/blob/0cfed33874bb7675621d58db53ddb8f37db3c1ef/src/feature.js#L192
it's related to isFirstToken variable which sets all upper case words which are not not first token to PROPN, it doesn't matter if they are in the same sentence or next one.
So sentence like TV. Suppose I tell you that it is true.. is enough to reproduce the error. For now I just changed the logic to return original pos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants