Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links to original tf code - fyi #1

Open
GenTxt opened this issue Jun 27, 2020 · 8 comments
Open

Links to original tf code - fyi #1

GenTxt opened this issue Jun 27, 2020 · 8 comments

Comments

@GenTxt
Copy link

GenTxt commented Jun 27, 2020

After reading deepmind blog post I was looking forward to downloading model but no luck. Looking forward to your implementation.

You may be aware of this post and link but if not this is the coder's original tf implementation. Hope it helps.

Copy of comment to original model request:

huggingface/transformers#4688

Interested in model weights too but currently not available. Author does mention releasing tf code here:

https://news.ycombinator.com/item?id=22290227

Requires tf 1.15+ and deepmind/sonnet ver 1.36. Link to python script here:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f744968f5209949ebe750f/sonnet/python/modules/nets/transformer.py#L915

Have tried running as-is but doesn't appear to have options for training on custom data as per the paper and available data sets.

@lucidrains
Copy link
Owner

@GenTxt it should be fairly straightforward to implement! i'll get it done, and leave it to someone else with more resources to train and share a model

i'll be adding a bunch of features I learned from building other types of transformers to further enhance it as well

@lucidrains
Copy link
Owner

@GenTxt This is almost ready! Do you plan on training this on any text corpus? Perhaps pg19?

@lucidrains
Copy link
Owner

@GenTxt huggingface/datasets#306 Once this is merged, it should be easy to start training

@GenTxt
Copy link
Author

GenTxt commented Jul 8, 2020 via email

@lucidrains
Copy link
Owner

  • Where is reference to 100000 epochs or iterations in train.py?

it's actually iterations, not epochs

  • Does it save the model/weights at the end of 100000?

nope, but that can be easily added!

  • How to use a simple text file e.g. corpus.txt (1 sentence per line)
    instead of ewik8.gz ?

that will take some work to setup, specifically you will have to write your own Dataset class. let me think about how to abstract this away though!

  • Can 'train.py' be modified as separate generation script for saved model above?

ahh yeah kind of, let me think about how to abstract this away as well. I'm thinking about something kind of like my stylegan2-pytorch repository, a commandline tool that lets you train, resume training, and generate easily

  • How to modify for using a multi-line input text file as start tokens?

yea, that will take some coding

  • Would like to see pg19 model but don't have the $ resources to train.

i'll setup training for PG19 soon-ish, and perhaps there will be some generous, curious person out there who will train it for us lol

@DarrenAbramson
Copy link

I have some experience pre-training BERT style models on custom subsets of PG, and access to lots of academic GPU time. Not to mention generous, and curious :D

@lucidrains would you like to collaborate on pre-training a Compressive Transformer?

@RajeshDM
Copy link

RajeshDM commented Sep 1, 2020

@lucidrains Hey Phil,

Great work with getting the implementation out in such a short amount of time.

I was trying to replicate the results of the paper and ran into a few issues.

I was trying to find how to calculate the final BPC score but could not find it as part of the current repository. Is that something you plan to add in the near future or open to a contribution from my side about the same?

There are also other smaller improvements which I believe can help make the repository better. Please let me know what you think about them

  1. I did not find save model code anywhere - Adding that would be great

  2. Making use of multiple GPUs for faster training - as of now I think only 1 GPU is being used and extending it to use multiple GPUs would help in making the training faster

@GenTxt
Copy link
Author

GenTxt commented May 11, 2021

Google finally released the two PG-19 models from the paper including code here:

https://github.com/google-research/google-research/tree/master/routing_transformer

https://storage.googleapis.com/rt-checkpoint/pg19_local.zip

https://storage.googleapis.com/rt-checkpoint/checkpoint.zip

Requires conversion to pytorch_model.bin and supporting files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants