Skip to content

A (bad) text transformer for normalizing Russian text

Notifications You must be signed in to change notification settings

maximxlss/text_normalization

Repository files navigation

text_normalization

Training replication procedure

  1. Clone this repo: git clone https://github.com/maximxlss/text_normalization
  2. cd text_normalization
  3. Install requirements: pip install -r requirements.txt
  4. Install PyTorch
  5. Download ru_train.csv from this Kaggle challenge
  6. Run python preprocess.py (takes time)
  7. Run python train_tokenizer.py (also takes time)
  8. Tweak settings in train.py
  9. Run python train.py
  10. I have reset the scheduler (see train.py) manually when training so keep that in mind. You can see the details of the training process in the metrics

About

A (bad) text transformer for normalizing Russian text

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages