-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using reinforce_loss #251
Comments
To "produce a tensor with shape [bs, sl]" from
-- Or refer to examples/seqgan to write by your own |
Thanks you @ZhitingHu
So my questions are: |
The code looks good. A reference code here (which is basically the same as what you wrote): #147 (comment) 2- it's not really necessary cuz you'd do the mask with |
@ZhitingHu Actually I am getting a OOM error when I add this RL loss the way I showed earlier to mle loss.
error log:
|
I couldn't see the why here. What's in the
If optimization (e.g,, |
Actually, the code will work fine if I have only loss_mle, it is working even when I have multiple loss_mle (which means the
|
running |
@ZhitingHu Here is the error when I remove train_op from fetches.
Also, I got the exact same error as above when I tried using I am sorry for inconvenience but I have no idea what's happenning or what am I doing wrong about rl_loss? |
Removing Based on the error msg after removing |
@ZhitingHu I was able to fix that bug, and now removing
|
hmm... The OOM is caused by the optimization (backward pass). Gradients of You may use Another effective way to reduce memory consumption is to use a smaller |
@ZhitingHu I really appreciate your help. How do I make sure, that during This where I sample two outputs for rl, and this where I compute rl_loss_fine. And where compute reward. |
@ZhitingHu I changed the |
I just wanted to check if negative loss (may happen when reward of greedy output (r_base) is greater than reward of sampled output (r_sample) ) or very small loss (most of the time the difference of these two loss are very small and multiplying them by log_prob results in small values) may cause problem in backward path? |
Texar automatically reuses variables. No need to add things like FYI, here is an example code of using Texar for self-critic learning, where
|
Thank you so much @ZhitingHu. This was really helpful, I was able to figure out what I am doing wrong and now the OOM error is gone. |
Glad to hear that! :) Could you briefly explain the cause of OOM, for future reference? Thanks |
Thanks However, I should have fixed sample_id (eos stripped and padded to same size) and then use this as input to |
I know it's kind of irrelevant to implementation details but I wanted to know during training when we want to do evaluation on a dev set periodically, Is it more common to compute the reward on the greedy output or the most probable beam_search (of specific width)? Or both approach is common? Thanks |
Hi,
I was trying to write a function for computing reinforce loss (as below) when I realized you have this here.
In this regard, how I can use the TransformerDecoder with ‘infer_sample’ decoding strategy as the sample_fn? In your reinforce_loss it is mentioned that the sample_fn should return [ids, probabilities, sequence_length], However the TransformerDecoder will return logits instead of probabilities. Can you guide me how I can use ‘TransformerDecoder’ with your reinforce_loss function? It should be a lot cleaner compared to my approach.
The way I was doing it, was to call
TransformerDecoder,
with 'infer_sample' decoding strategy and then took the log_softmax. like following:However, I am stuck in some steps, like gathering the log_probabilities according to the sample_ids. I could do it with numpy but not tensorflow.
Also I am not sure if this is the right approach to compute loss, so either you can guide me with how to use reinforce_loss and TransformerDecoder sampling or help me with my own script, that would be highly appreciated.
The text was updated successfully, but these errors were encountered: