Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got out of memory error while working with large file #17

Open
SomnathKadam opened this issue May 24, 2016 · 1 comment
Open

Got out of memory error while working with large file #17

SomnathKadam opened this issue May 24, 2016 · 1 comment

Comments

@SomnathKadam
Copy link

SomnathKadam commented May 24, 2016

Hi Team,

When I run wapiti CRF on 36k training data with following command, return

"out of memory error, train model with L-BFGS. "

wapiti train -p ../template_7feats -1 5 --nthread 5 ../train_feats.txt 36kmodel_wapiti

Thanks,
Somnath A. Kadam

@dsindex
Copy link

dsindex commented Jul 23, 2018

@SomnathKadam

i got a same issue when i use 'bigram' features for large training data.
memory usage went exploding up to 100G;
this is not for 'unigram' features.

  • crf.pattern
b

u:wrd LL=%X[-2,0]
u:tag LL=%X[-2,1]

u:wrd L=%X[-1,0]
u:tag L=%X[-1,1]

*:wrd X=%X[0,0]
*:tag X=%X[0,1]

u:wrd R=%X[1,0]
u:tag R=%X[1,1]

u:wrd RR=%X[2,0]
u:tag RR=%X[2,1]
  • train
$ wapiti -t 16 -c -p crf.pattern train.txt crf.model

however, when i modified the crf.pattern
(use only 'b' transition), things goes fine

  • crf.pattern
#unigram
u:wrd LL=%X[-2,0]
u:tag LL=%X[-2,1]

u:wrd L=%X[-1,0]
u:tag L=%X[-1,1]

u:wrd X=%X[0,0]
u:tag X=%X[0,1]

u:wrd R=%X[1,0]
u:tag R=%X[1,1]

u:wrd RR=%X[2,0]
u:tag RR=%X[2,1]

#bigram
b
  • train
$ wapiti -t 16 -c -p crf.pattern train.txt crf.model
....
  [   3] obj=1897392.82 act=989401   err=45.81%/99.34% time=4645.94s/11109.04s
  [   4] obj=1864936.55 act=1397073  err=45.81%/99.34% time=5211.76s/16320.80s
  [   5] obj=1862659.23 act=978958   err=45.81%/99.34% time=3486.53s/19807.33s
* Compacting the model
    - Scan the model
    - Compact it
        1278 observations removed
      886932 features removed
* Save the model
* Done

but as you can see, training failed at 5 iterations.
( without bigram feature, iterations continues to 60 and err is 1.8% )

i found there was a similar issue. those setting solved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants