-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rroot relation in model predictions #30
Comments
Mmmh this is strange, rroot is used indeed as a dummy dependency relation for the dummy root token, it should never be used for any other token and should never be printed. This is quite hard to debug if it's that infrequent :/ It probably won't help but can you show me a sample of conllu output where this happens? |
Here is some output for Basque:
From what I could tell there are only about 4 sentences in the Basque dev set across all training epochs where rroot has been predicted, but per epoch, it gets predicted at most twice, so there is some variation. And Hindi:
Here, there appear to be more instances. In some epochs, rroot gets predicted as much as 17 times. |
Thanks! These two sentences are non-projective. My suspicion is that it might be due to the max_swap in Predict, in uuparser/arc_hybrid.py which should actually not be necessary, I used this in early debugging days but never went back to change it. Could you try setting max_swap to inf or len(sentence)*len(sentence)? In this line: uuparser/uuparser/arc_hybrid.py Line 287 in c0d8a82
|
I tried both versions:
|
Ok, thanks! I still think it must have something to do with non-projectivity and the use of swap but I have no idea what specifically at this point. I will take a look but it probably won't be this week, sorry :/ |
I have been training parsers for multiple languages and observed small number of instances, where the parser predicts rroot instead of root on the dev set.
At first I thought, this could be due to typos in the training data, but I could not find any instances in any of the UD treebanks (version 2.8). Instead, I found that rroot is introduced as part of a dummy root node in read_conll in utils.py.
I suppose this is not really a typo in the code, but a dummy value that is meant to be overwritten by the parser and in most cases is.
The options I set were
--dynet-mem 6000 --epochs 50 --k=2 --pos-emb-size 0 --char-emb-size 100 --disable-rlmost
and I observed it in some the dev predictions starting at epoch 22 for Basque-BDT (random seed of 2) and in some of the predictions starting at the first epoch for Hindi-HDTB (random seed of 5).
The text was updated successfully, but these errors were encountered: