Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating "Classifying Names with a Character-Level RNN" #2954

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

mgs28
Copy link

@mgs28 mgs28 commented Jun 24, 2024

Fixes #1166

Description

Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.

  • Cannot use default Dataloader to select batch sizes because "stack expects each tensor to be of equal size" but each of the names are of different length. However, updated the code to use mini batches without Dataloader functionality.
  • Introducing pytorch's Datasets class, we show how to split the data into train and test datasets which changes the training explanation.
  • Rewrote pieces of the tutorial to use three classes to improve re-use (Data, DataSet and RNN).
  • Added a little more explanation to how RNNs score multi-character strings and their 2D matrix of tensors.
  • Changed evaluation from random training examples to an entire the test set.
  • removed some of the command line explanations since notebooks are used more often.
  • tried to preserve as much of the original text, functions and style as possible.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @albanD

Copy link

pytorch-bot bot commented Jun 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9d81a1 with merge base 7490332 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

Hi @mgs28!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@mgs28 mgs28 closed this Jun 24, 2024
@mgs28 mgs28 reopened this Jun 24, 2024
@mgs28 mgs28 marked this pull request as ready for review June 24, 2024 19:08
@mgs28
Copy link
Author

mgs28 commented Jun 26, 2024

@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks!

@mgs28
Copy link
Author

mgs28 commented Jun 28, 2024

Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset.

Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

@mgs28
Copy link
Author

mgs28 commented Jul 8, 2024

@svekars - would you please help me with this tutorial update pull request or point me to someone who could?

@svekars
Copy link
Contributor

svekars commented Jul 8, 2024

cc: @spro

@mgs28
Copy link
Author

mgs28 commented Jul 10, 2024

Sorry about the spelling errors! I ran pyspelling and re-ran make html for the tutorial. This should pass those CI steps now.

I also added a story for me to come back and update the CONTRIBUTING.md to include some of these checks. (#2969)

Thanks @spro @svekars !

@mgs28
Copy link
Author

mgs28 commented Jul 11, 2024

@spro and @svekars - I significantly cut the training time although it is faster on my CPU than GPU. It runs in 72 seconds on my local CPU. I also added some default device so it looks for your CUDA build machines to hopefully make it faster.

Thanks!

@@ -3,6 +3,7 @@
NLP From Scratch: Classifying Names with a Character-Level RNN
**************************************************************
**Author**: `Sean Robertson <https://github.com/spro>`_
**Updated**: `Matthew Schultz <https://github.com/mgs28>`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we typically don't add Update. Please remove this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svekars - anything else I can do to make this better? thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svekars - hello, are there other items I should address here? I appreciate your help with this!

@svekars svekars added the core Tutorials of any level of difficulty related to the core pytorch functionality label Sep 4, 2024
Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! At a high level, I have the following comments / concerns:

  • I think it's a good idea to update the tutorial to utilize PyTorch's Dataset and DataLoader abstractions.
    • I'm not sold on the need for the NameData class. It adds what I consider unnecessary complexity / code. It's perfectly simple to do any conversions between names <-> tensors as simple standalone functions utilized by the dataset.
  • I agree that proper train / test / validation splits should be done in the tutorial, so that's a nice addition.
  • I'm okay with defining a custom RNN module for illustration purposes, although in practice we'd encourage the use of nn.RNN, and I'm not sure if this existed when the tutorial was originally written. One thing the officially provided module in torch.nn provides is better perf due to e.g. cuDNN-accelerated kernels. If we don't use this within the tutorial, I think it should at least be mentioned and recommended for the perf benefits. All that said, I have some comments on the RNN module defined in the tutorial:
    • It's a bit confusing to redefine it multiple times in the tutorial, adding stuff to it each time. I'd recommend a single definition.
    • The learn() API does not belong on RNN and I suggest leaving the training logic in a standalone train(). This way, it's more PyTorch-idiomatic and easier for users to switch to some third-party training API (e.g. ignite, PyTorch Lightning, etc.).

# ASCII).
#
# The first thing we need to define is our data items. In this case, we will create a class called NameData
# which will have an __init__ function to specify the input fields and some helper functions. Our first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# which will have an __init__ function to specify the input fields and some helper functions. Our first
# which will have an ``__init__`` function to specify the input fields and some helper functions. Our first

#
# The first thing we need to define is our data items. In this case, we will create a class called NameData
# which will have an __init__ function to specify the input fields and some helper functions. Our first
# helper function will be __str__ to convert objects to strings for easy printing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# helper function will be __str__ to convert objects to strings for easy printing
# helper function will be ``__str__`` to convert objects to strings for easy printing

# ``all_categories`` (just a list of languages) and ``n_categories`` for
# later reference.
#########################
#Now we can use that class to create a singe piece of data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#Now we can use that class to create a singe piece of data.
#Now we can use that class to create a single piece of data.

@@ -181,21 +255,22 @@ def lineToTensor(line):
#
# This RNN module implements a "vanilla RNN" an is just 3 linear layers
# which operate on an input and hidden state, with a ``LogSoftmax`` layer
# after the output.
# after the output.s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# after the output.s
# after the output.

…ng all RNN definition into one, moving RNN.learn() to separate train()
@mgs28
Copy link
Author

mgs28 commented Sep 13, 2024

@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things:

First, the edits to my existing content.

  1. Excellent point on NameData. I removed it and used helper functions.
  2. I'm glad you like the Dataset addition - that was my prompt for doing this.
  3. Thanks for letting me know the multiple definitions were confusing. I focused it on one simple one, particularly since training was split from the object.
  4. I added training as a separate function. I will go learn more about those third party trainers since I haven't used them.

Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers.

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Thanks!

@mgs28
Copy link
Author

mgs28 commented Sep 13, 2024

To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster)

class MyRNN(nn.RNN):
def init(self,input_size, hidden_size, output_size):
super(MyRNN, self).init(input_size, hidden_size)

    self.h2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, line_tensor):
    # Pass the input through the RNN layers
    rnn_out, hidden = super(MyRNN, self).forward(line_tensor)
    output = self.h2o(hidden[0])
    output = self.softmax(output)
    
    return output

@jbschlosser
Copy link
Contributor

jbschlosser commented Sep 16, 2024

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Rather than inherit, we generally encourage composition. In this case, something like:

# better name needed :)
class MyRNN(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.rnn = nn.RNN(...)
        self.h2o = nn.Linear(...)

    ...
    def forward(self, x):
        _, hidden = self.rnn(x)
        output = self.h2o(hidden[0])
        return F.log_softmax(output, dim=1)

@mgs28
Copy link
Author

mgs28 commented Sep 17, 2024

Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed core Tutorials of any level of difficulty related to the core pytorch functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Char RNN classification with batch size
4 participants