Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the role of 'maxlen' parameter? #55

Open
amirj opened this issue May 28, 2016 · 3 comments
Open

what is the role of 'maxlen' parameter? #55

amirj opened this issue May 28, 2016 · 3 comments

Comments

@amirj
Copy link

amirj commented May 28, 2016

'maxlen' is one of the parameters in 'train_nmt.py', set to 50 by default.
I get the following message during the training process: "Minibatch with zero sample under length 100"
Investigating the source code shows that this message is appear when there is a batch size that the length of the source and target is greater than 'maxlen'.
On the other hand, in 'data_iterator.py' training samples have been skipped when the length of source and target is greater than 'maxlen'.

  1. Why such a contradiction is exist? -passing samples in data_iterator and then filter them in 'prpare-data'
  2. If I set maxlen to a large value (1000 for example), the updating time is significantly increase, would you describe why?
@orhanf
Copy link
Collaborator

orhanf commented Jul 9, 2016

Thank you for pointing this out,

  1. We have this functionality because prepare_data is used in two different places, with different behavior (although i agree that there is a redundancy in the filtering).
    • In the training loop here, where we also pass maxlen to ensure we don't use large sequences, to save on computation time.
    • Used by pred_probs to compute validation set log-likelihood, here, where we do not specify maxlen to consider all the samples in the validation set.
  2. Basically, model spends a lot of time to do the forward and backward passes for longer sequences, and since we do not use truncated-BPTT (see truncate_gradient parameter in scan function here), we store all the activations in the forward pass to be used in the backward pass, which has an impact on computation time and the amount of memory that's being used. For longer sequences (like 1000 in your case) you might need to play with truncate_gradient parameter of scan.

@hanskrupakar
Copy link

hanskrupakar commented Oct 19, 2016

I have the same problem but for me, the line shows a maxlen parameter lesser than what I want (I want 100 words but I'm only getting 15 as maxlen). I don't want my training to only be carried out for sentences of length 15.

def train(dim_word=100,  # word vector dimensionality
          dim=1000,  # the number of LSTM units
          encoder='gru',
          decoder='gru_cond',
          patience=10,  # early stopping patience
          max_epochs=5000,
          finish_after=100000000000000000000000,  # finish after this many updates
          dispFreq=100,
          decay_c=0.,  # L2 regularization penalty
          alpha_c=0.,  # alignment regularization
          clip_c=-1.,  # gradient clipping threshold
          lrate=0.01,  # learning rate
          n_words_src=65000,  # source vocabulary size
          n_words=50000,  # target vocabulary size
          maxlen=100,  # maximum length of the description
          optimizer='rmsprop',
hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/dl4mt-tutorial-master/session3$ ./train.sh 
Using gpu device 0: GeForce GT 650M (CNMeM is disabled, cuDNN 4007)
{'use-dropout': [True], 'dim': [1000], 'optimizer': ['rmsprop'], 'dim_word': [150], 'reload': [False], 'clip-c': [1.0], 'n-words': [50000], 'model': ['/home/hans/git/dl4mt-tutorial/session3/model.npz'], 'learning-rate': [0.0001], 'decay-c': [0.99]}
Loading data
Building model
Building sampler
Building f_init... Done
Building f_next.. Done
Building f_log_probs... Done
Building f_cost... Done
Computing gradient... Done
Building optimizers... Done
Optimization
...................................
...................................
...................................
Epoch  0 Update  65 Cost  17509.4765625 UD  0.767469167709
Epoch  0 Update  66 Cost  17504.859375 UD  0.822523832321
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Epoch  0 Update  67 Cost  17467.9296875 UD  0.752150058746
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Epoch  0 Update  68 Cost  17452.5976562 UD  0.831667900085
Epoch  0 Update  69 Cost  17394.2402344 UD  0.73230099678
Epoch  0 Update  70 Cost  17384.1113281 UD  0.830217123032
Minibatch with zero sample under length  15
Epoch  0 Update  71 Cost  17374.1601562 UD  0.820451974869
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Minibatch with zero sample under length  15
Epoch  0 Update  72 Cost  17322.9296875 UD  0.877825975418
Epoch  0 Update  73 Cost  17319.2441406 UD  0.862649917603
Epoch  0 Update  74 Cost  17258.2480469 UD  0.820302963257
Minibatch with zero sample under length  15
Epoch  0 Update  75 Cost  17266.3398438 UD  0.854918003082
Minibatch with zero sample under length  15

So please help as to what I should do to ensure that the model gets trained over sentences of upto 100 words in length?
Also, can you point out to me where the actual value 15 comes from?

@orhanf
Copy link
Collaborator

orhanf commented Oct 20, 2016

Hi @hanskrupakar, by default the maxlen parameter is set to 50, as you can check here and please compare it with your fork. This value is passed to the data iterator and TextIterator filters the sequences respectively.

In your case, please check the average sequence length of your dataset, if your sequences are short in average you may need to further adjust maxlen or you can even introduce another hyper-parameter like minlen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants