Is it possible to unset random seed for BPE-dropout? #75

skurzhanskyi · 2020-09-17T11:38:55Z

In YouTokeToMe BPE-dropout is always the same for the same input. That contradicts the idea described in the paper:

During segmentation, at each merge step some merges are randomly dropped with the probability p.

The text was updated successfully, but these errors were encountered:

kefirski · 2020-09-20T17:36:46Z

Could you please provide more information about the issue? I've tested yttm bpe dropout in python REPL and obtained different subword tokenization for different runs

>>> for _ in range(5):
...     bpe.encode("i do not observe such behavior", dropout_prob=0.3)
...
[4, 52, 1644, 57, 16465, 1423, 78, 63, 1167, 31193, 1104, 19376, 73, 9407, 73, 9670, 52, 1936]
[4, 52, 1644, 57, 16465, 1423, 78, 63, 1167, 31193, 14245, 3730, 9407, 73, 9670, 52, 1936]
[4, 52, 19543, 2242, 57, 59, 1423, 78, 63, 51, 62, 79, 51, 14245, 3730, 4, 78, 51, 73, 9670, 52, 57, 62]
[4, 52, 19543, 16465, 1423, 78, 63, 1167, 79, 51, 14245, 3730, 9407, 73, 9670, 52, 1936]
[4, 52, 19543, 16465, 1423, 78, 63, 1167, 31193, 14245, 3730, 9407, 73, 9670, 52, 1936]

skurzhanskyi · 2020-09-20T18:21:50Z

Sure, I'm getting the same output by using yttm encode:

>>> for i in 1 2 3 4 5
... do
...    echo "i do observe such behavior" | yttm encode --model model/path --output_type subword --dropout_prob 0.3   
... done
...
n_threads: 4
▁ i ▁do ▁ob s erve ▁s uc h ▁behavior 
bytes processed: 26
n_threads: 4
▁ i ▁do ▁ob s erve ▁s uc h ▁behavior 
bytes processed: 26
n_threads: 4
▁ i ▁do ▁ob s erve ▁s uc h ▁behavior 
bytes processed: 26
n_threads: 4
▁ i ▁do ▁ob s erve ▁s uc h ▁behavior 
bytes processed: 26
n_threads: 4
▁ i ▁do ▁ob s erve ▁s uc h ▁behavior 
bytes processed: 26

My version is 1.0.6.

skurzhanskyi changed the title ~~Is it possible to set random seed for BPE-dropout?~~ Is it possible to unset random seed for BPE-dropout? Sep 17, 2020

yutkin added the enhancement New feature or request label Sep 17, 2020

yutkin mentioned this issue Sep 17, 2020

Set random seed for bpe dropout #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to unset random seed for BPE-dropout? #75

Is it possible to unset random seed for BPE-dropout? #75

skurzhanskyi commented Sep 17, 2020 •

edited

Loading

kefirski commented Sep 20, 2020

skurzhanskyi commented Sep 20, 2020

Is it possible to unset random seed for BPE-dropout? #75

Is it possible to unset random seed for BPE-dropout? #75

Comments

skurzhanskyi commented Sep 17, 2020 • edited Loading

kefirski commented Sep 20, 2020

skurzhanskyi commented Sep 20, 2020

skurzhanskyi commented Sep 17, 2020 •

edited

Loading