You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some future ideas I have, the pretrained vocabulary will need approximated counts. One way to do this is to fit it to a zipfian distribution, but people have noticed that natural language vocabularies tend to be best modeled with three zipfian distributions--one for frequent words, one for medium-frequency words, and one for rare words. So I have the model fit the training file's vocabulary to the interpolation of three zipfian distributions.
But, the process of fitting this is slow, not currently useful, and probably confusing for people expecting a parser and not a zipfian regressor. So it should either be removed and only called by the experimental classes once they've been built, or its optimization accelerated and done in numpy so that it's not noticeable to laypeople.
One way of speeding it up might be alternating using Newton's method to optimize the zipfian parts and using Newton's method, Adam, or gradient descent to optimize the softmax, rather than using Adam to optimize the whole thing at once.
The text was updated successfully, but these errors were encountered:
For some future ideas I have, the pretrained vocabulary will need approximated counts. One way to do this is to fit it to a zipfian distribution, but people have noticed that natural language vocabularies tend to be best modeled with three zipfian distributions--one for frequent words, one for medium-frequency words, and one for rare words. So I have the model fit the training file's vocabulary to the interpolation of three zipfian distributions.
But, the process of fitting this is slow, not currently useful, and probably confusing for people expecting a parser and not a zipfian regressor. So it should either be removed and only called by the experimental classes once they've been built, or its optimization accelerated and done in numpy so that it's not noticeable to laypeople.
One way of speeding it up might be alternating using Newton's method to optimize the zipfian parts and using Newton's method, Adam, or gradient descent to optimize the softmax, rather than using Adam to optimize the whole thing at once.
The text was updated successfully, but these errors were encountered: