-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Links to original tf code - fyi #1
Comments
@GenTxt it should be fairly straightforward to implement! i'll get it done, and leave it to someone else with more resources to train and share a model i'll be adding a bunch of features I learned from building other types of transformers to further enhance it as well |
@GenTxt This is almost ready! Do you plan on training this on any text corpus? Perhaps pg19? |
@GenTxt huggingface/datasets#306 Once this is merged, it should be easy to start training |
Hi Phil:
Thanks for the updates. Currently running the ewik8 train.py on my home
machine and terminal output looks good.
Have a few questions:
- Where is reference to 100000 epochs or iterations in train.py?
e.g.
training loss: 2.4765 | aux_loss: 0.9664
training: 0%| | 70/100000 [05:45<129:44:50,
4.67s/it]training loss: 2.4784 | aux_loss: 0.0000
training loss: 2.4343 | aux_loss: 0.0000
- Does it save the model/weights at the end of 100000?
- How to use a simple text file e.g. corpus.txt (1 sentence per line)
instead of ewik8.gz ?
- Can 'train.py' be modified as separate generation script for saved model
above?
- How to modify for using a multi-line input text file as start tokens?
prime = torch.ones(1, 1).cuda() # assume 1 is start token OR input.txt
Not a coder but I can make basic modifications to scripts. Would like to
see pg19 model but don't have the $ resources to train.
Thanks
…On Sat, Jul 4, 2020 at 2:26 PM Phil Wang ***@***.***> wrote:
@GenTxt <https://github.com/GenTxt> works great on enwik8 now, you should
totally try this! are you a coder? or do you need this simplified even more?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFMAWPPPZJFKI6SP4EHLUE3RZ5X5LANCNFSM4OKBSFNQ>
.
|
it's actually iterations, not epochs
nope, but that can be easily added!
that will take some work to setup, specifically you will have to write your own Dataset class. let me think about how to abstract this away though!
ahh yeah kind of, let me think about how to abstract this away as well. I'm thinking about something kind of like my stylegan2-pytorch repository, a commandline tool that lets you train, resume training, and generate easily
yea, that will take some coding
i'll setup training for PG19 soon-ish, and perhaps there will be some generous, curious person out there who will train it for us lol |
I have some experience pre-training BERT style models on custom subsets of PG, and access to lots of academic GPU time. Not to mention generous, and curious :D @lucidrains would you like to collaborate on pre-training a Compressive Transformer? |
@lucidrains Hey Phil, Great work with getting the implementation out in such a short amount of time. I was trying to replicate the results of the paper and ran into a few issues. I was trying to find how to calculate the final BPC score but could not find it as part of the current repository. Is that something you plan to add in the near future or open to a contribution from my side about the same? There are also other smaller improvements which I believe can help make the repository better. Please let me know what you think about them
|
Google finally released the two PG-19 models from the paper including code here: https://github.com/google-research/google-research/tree/master/routing_transformer https://storage.googleapis.com/rt-checkpoint/pg19_local.zip https://storage.googleapis.com/rt-checkpoint/checkpoint.zip Requires conversion to pytorch_model.bin and supporting files. |
After reading deepmind blog post I was looking forward to downloading model but no luck. Looking forward to your implementation.
You may be aware of this post and link but if not this is the coder's original tf implementation. Hope it helps.
Copy of comment to original model request:
huggingface/transformers#4688
Interested in model weights too but currently not available. Author does mention releasing tf code here:
https://news.ycombinator.com/item?id=22290227
Requires tf 1.15+ and deepmind/sonnet ver 1.36. Link to python script here:
https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f744968f5209949ebe750f/sonnet/python/modules/nets/transformer.py#L915
Have tried running as-is but doesn't appear to have options for training on custom data as per the paper and available data sets.
The text was updated successfully, but these errors were encountered: