Releases: minimaxir/gpt-2-simple
v0.8.1: TensorFlow 2 support
Thanks to https://github.com/YaleDHLab via #275, gpt-2-simple now supports TensorFlow 2 by default, and the minimum TensorFlow version is now 2.5.1! The Colab Notebook has also been update to no longer use TensorFlow 1.X.
Note: Development on gpt-2-simple has mostly been superceded by aitextgen, which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be loaded using aitextgen as well.
Fix model URL
Remove finetuning asserts
Some have successfully finetuned 774M/1558M, so the assert has been removed.
Multi-GPU support + TF 2.0 assert
Handle 774M (large)
- 774M is explicitly blocked from being fine-tuned and will trigger an assert if attempted. If a way to finetune it without being super-painful is added, the ability to finetune it will be restored.
- Allow ability to generate text from the default pretrained models by passing
model_name
togpt2.load_gpt2()
andgpt2.generate()
(this will work with 774M. - Add
sgd
as anoptimizer
parameter tofinetune
(default:adam
) - Support for changed model names, w/ changes more prominent in the README.
Polish before TF 2.0
Remove assertion
Assertion was triggering false positives, so removing it.
Prevent OOB + Cap Gen Length
Minor fix to prevent issue hit with gpt-2-cloud-run.
A goal of the release was to allow a graph reset without resetting the parameters; that did not seem to work, so holding off on that release.
Fixed prefix + miscellaneous bug fixes
Merged PRs (including fix for prefix issue). (see commits for more info)
A bunch of highly-requested features
Adapted a few functions from Neil Shepperd's fork:
- Nucleus Sampling (
top_p
) when generating text, which results in surprisingly different results. (settingtop_p=0.9
works well). Supercedestop_k
when used. (#51) - An
encode_dataset()
function to preencode and compress a large dataset before loading it for finetuning. (#19, #54)
Improvements to continuing model training:
overwrite
argument forfinetune
: withrestore_from="latest"
, this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (#20)- You can continue to
finetune
a model without having the original GPT-2 model present.
Improvements with I/O involving Colaboratory
- Checkpoint folders are now packaged into a
.tar
file when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can passcopy_folder=True
to thecopy_checkpoint
function to revert to the old behavior). (#37: thanks @woctezuma !) copy_checkpoint_to_gdrive
andcopy_checkpoint_from_gdrive
now take arun_name
argument instead of acheckpoint_folder
argument.
Miscellaneous
- Added CLI arguments for
top_k
,top_p
,overwrite
. - Cleaned up redundant function parameters (#39)