Releases: minimaxir/textgenrnn
Releases · minimaxir/textgenrnn
TF 2.1 support
Synthesis + Generate Progress bar
Two major features:
Synthesis (beta)
Generate text using two (or more!) trained models simultaneously. See this notebook for a demo.
The results are messier than usual so a lower temperature
is recommended. It should work on both char-level and word-level models, or a mix of both. (however, I do not recommending mixing line-delimited and full text models!)
Please file issues if there are errors!
Generate Progress Bar
Thanks to tqdm
, all generate
functions show a progress bar! You can override this by passing progress=False
to the function.
Additionally, the default generate temperature is now [1.0, 0.5, 0.2, 0.2]
!
Emergency Case Fix
v1.4.1 tensorflow dependency note #76
Interactive Mode + Bug Fixes
Features
- Interactive mode, which lets you control which text is added. (#52, thanks @juanets !)
- Allow backends other than TensorFlow (#44, thanks @torokati44 !)
- Allow periodic weights saving (#37, thanks @IrekRybark !)
- Multi-GPU support (beta: see #62 )
Fixes
- Handle
prefix
in word-level models correctly.
1.3.2
Temperature Cycling + Fixes
- Added ability to cycle temperatures during training (see this notebook for more information
- Added utf-8 encoding for vocab export.
- Added alias for
train_on_texts(new_model=True)
totrain_new_model
. - Fixed an issue where specifying
dropout
could cause issues.
First implementation of encoding text
- Added
encode_text_vectors
to encode text using the trained network. - Added
similarity
to quickly calculate cosine similarity and return the most similar texts.
See this notebook for details.
Minor Fixes
- Make
is_csv
work for real downstream. - Description tweaks
CSV Utility
- Added
validation
to disable validation training for speed. - Added
is_csv
: Use withtrain_from_file
if the source file is a one-column CSV (e.g. an export from BigQuery or Google Sheets) for proper quote/newline escaping. - README tweaks
Handle Overfitting
- Renamed
prop_keep
totrain_size
, and will use the remaining data for validation. - Added
dropout
, which randomly excludes input tokens each epoch.