Language Translation with nn.Transformer and torchtext[BUG] - mask with -inf leads to nans. 

### Add Link

https://pytorch.org/tutorials/beginner/translation_transformer.html

### Describe the bug

Running the tutorial on language translation with transformers leads to nans when training on the first batch iteration on the first epoch, and even when evaluating an untrained model for some input sequences. 

I find this issue simply by copy-pasting the tutorial to my local computer and starting the training process. The issue seems to stem from the target mask. Replacing the line 

mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))

by 

mask = mask.float().masked_fill(mask == 0, float('-1e9')).masked_fill(mask == 1, float(0.0))

allows to train the model for a few batches on the first epoch without nan outputs. A problem that is possibly related has been pointed out in https://github.com/pytorch/pytorch/issues/41508#issuecomment-1723119580

However, even with this "fix", the losses of the model increase with training, and eventually they become nan too. 


### Describe your environment

Running on MacOS. I am using pytorch 2.2.2 and python 3.9.7. 

cc @pytorch/team-text-core @Nayef211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Language Translation with nn.Transformer and torchtext[BUG] - mask with -inf leads to nans. #2988

Add Link

Describe the bug

Describe your environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Language Translation with nn.Transformer and torchtext[BUG] - mask with -inf leads to nans. #2988

Description

Add Link

Describe the bug

Describe your environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions