Links to original tf code - fyi #1

GenTxt · 2020-06-27T14:13:53Z

After reading deepmind blog post I was looking forward to downloading model but no luck. Looking forward to your implementation.

You may be aware of this post and link but if not this is the coder's original tf implementation. Hope it helps.

Copy of comment to original model request:

huggingface/transformers#4688

Interested in model weights too but currently not available. Author does mention releasing tf code here:

https://news.ycombinator.com/item?id=22290227

Requires tf 1.15+ and deepmind/sonnet ver 1.36. Link to python script here:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f744968f5209949ebe750f/sonnet/python/modules/nets/transformer.py#L915

Have tried running as-is but doesn't appear to have options for training on custom data as per the paper and available data sets.

lucidrains · 2020-06-27T20:40:41Z

@GenTxt it should be fairly straightforward to implement! i'll get it done, and leave it to someone else with more resources to train and share a model

i'll be adding a bunch of features I learned from building other types of transformers to further enhance it as well

lucidrains · 2020-07-02T17:54:47Z

@GenTxt This is almost ready! Do you plan on training this on any text corpus? Perhaps pg19?

lucidrains · 2020-07-03T18:10:27Z

@GenTxt huggingface/datasets#306 Once this is merged, it should be easy to start training

GenTxt · 2020-07-08T16:19:01Z

Hi Phil: Thanks for the updates. Currently running the ewik8 train.py on my home machine and terminal output looks good. Have a few questions: - Where is reference to 100000 epochs or iterations in train.py? e.g. training loss: 2.4765 | aux_loss: 0.9664 training: 0%| | 70/100000 [05:45<129:44:50, 4.67s/it]training loss: 2.4784 | aux_loss: 0.0000 training loss: 2.4343 | aux_loss: 0.0000 - Does it save the model/weights at the end of 100000? - How to use a simple text file e.g. corpus.txt (1 sentence per line) instead of ewik8.gz ? - Can 'train.py' be modified as separate generation script for saved model above? - How to modify for using a multi-line input text file as start tokens? prime = torch.ones(1, 1).cuda() # assume 1 is start token OR input.txt Not a coder but I can make basic modifications to scripts. Would like to see pg19 model but don't have the $ resources to train. Thanks

…

On Sat, Jul 4, 2020 at 2:26 PM Phil Wang ***@***.***> wrote: @GenTxt <https://github.com/GenTxt> works great on enwik8 now, you should totally try this! are you a coder? or do you need this simplified even more? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFMAWPPPZJFKI6SP4EHLUE3RZ5X5LANCNFSM4OKBSFNQ> .

lucidrains · 2020-07-09T18:12:43Z

Where is reference to 100000 epochs or iterations in train.py?

it's actually iterations, not epochs

Does it save the model/weights at the end of 100000?

nope, but that can be easily added!

How to use a simple text file e.g. corpus.txt (1 sentence per line)
instead of ewik8.gz ?

that will take some work to setup, specifically you will have to write your own Dataset class. let me think about how to abstract this away though!

Can 'train.py' be modified as separate generation script for saved model above?

ahh yeah kind of, let me think about how to abstract this away as well. I'm thinking about something kind of like my stylegan2-pytorch repository, a commandline tool that lets you train, resume training, and generate easily

How to modify for using a multi-line input text file as start tokens?

yea, that will take some coding

Would like to see pg19 model but don't have the $ resources to train.

i'll setup training for PG19 soon-ish, and perhaps there will be some generous, curious person out there who will train it for us lol

DarrenAbramson · 2020-07-27T14:33:16Z

I have some experience pre-training BERT style models on custom subsets of PG, and access to lots of academic GPU time. Not to mention generous, and curious :D

@lucidrains would you like to collaborate on pre-training a Compressive Transformer?

RajeshDM · 2020-09-01T23:01:46Z

@lucidrains Hey Phil,

Great work with getting the implementation out in such a short amount of time.

I was trying to replicate the results of the paper and ran into a few issues.

I was trying to find how to calculate the final BPC score but could not find it as part of the current repository. Is that something you plan to add in the near future or open to a contribution from my side about the same?

There are also other smaller improvements which I believe can help make the repository better. Please let me know what you think about them

I did not find save model code anywhere - Adding that would be great
Making use of multiple GPUs for faster training - as of now I think only 1 GPU is being used and extending it to use multiple GPUs would help in making the training faster

GenTxt · 2021-05-11T16:13:08Z

Google finally released the two PG-19 models from the paper including code here:

https://github.com/google-research/google-research/tree/master/routing_transformer

https://storage.googleapis.com/rt-checkpoint/pg19_local.zip

https://storage.googleapis.com/rt-checkpoint/checkpoint.zip

Requires conversion to pytorch_model.bin and supporting files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Links to original tf code - fyi #1

Links to original tf code - fyi #1

GenTxt commented Jun 27, 2020

lucidrains commented Jun 27, 2020

lucidrains commented Jul 2, 2020

lucidrains commented Jul 3, 2020

GenTxt commented Jul 8, 2020 via email

lucidrains commented Jul 9, 2020

DarrenAbramson commented Jul 27, 2020

RajeshDM commented Sep 1, 2020

GenTxt commented May 11, 2021

Links to original tf code - fyi #1

Links to original tf code - fyi #1

Comments

GenTxt commented Jun 27, 2020

lucidrains commented Jun 27, 2020

lucidrains commented Jul 2, 2020

lucidrains commented Jul 3, 2020

GenTxt commented Jul 8, 2020 via email

lucidrains commented Jul 9, 2020

DarrenAbramson commented Jul 27, 2020

RajeshDM commented Sep 1, 2020

GenTxt commented May 11, 2021