Remove/accelerate experimental features #6

tdozat · 2017-06-18T18:41:43Z

For some future ideas I have, the pretrained vocabulary will need approximated counts. One way to do this is to fit it to a zipfian distribution, but people have noticed that natural language vocabularies tend to be best modeled with three zipfian distributions--one for frequent words, one for medium-frequency words, and one for rare words. So I have the model fit the training file's vocabulary to the interpolation of three zipfian distributions.

But, the process of fitting this is slow, not currently useful, and probably confusing for people expecting a parser and not a zipfian regressor. So it should either be removed and only called by the experimental classes once they've been built, or its optimization accelerated and done in numpy so that it's not noticeable to laypeople.

One way of speeding it up might be alternating using Newton's method to optimize the zipfian parts and using Newton's method, Adam, or gradient descent to optimize the softmax, rather than using Adam to optimize the whole thing at once.

tdozat self-assigned this Jun 18, 2017

tdozat added the enhancement label Jun 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove/accelerate experimental features #6

Remove/accelerate experimental features #6

tdozat commented Jun 18, 2017

Remove/accelerate experimental features #6

Remove/accelerate experimental features #6

Comments

tdozat commented Jun 18, 2017