diff --git a/README _cn.md b/README _cn.md new file mode 100644 index 00000000..d0e21680 --- /dev/null +++ b/README _cn.md @@ -0,0 +1,252 @@ +
+ +# 字符识别算法:PARSeq
语言拓展:中文训练与应用 + + +[**原项目地址**](https://github.com/baudm/parseq) [**论文**](https://arxiv.org/pdf/2207.06966.pdf) + +[环境安装](#环境安装) | [数据准备](#数据准备) | [训练](#getting-started) | [评估](#frequently-asked-questions) | [部署](#training) + +
+ +场景文本识别 (STR) 模型使用语言上下文来增强对噪声或损坏图像的鲁棒性。 最近的方法(例如 ABINet)使用独立或外部语言模型 (LM) 来进行预测细化。 在这项工作中,我们表明,外部 LM(需要预先分配专用计算能力)对于 STR 而言效率低下,因为其性能与成本特征较差。 我们提出了一种使用置换自回归序列(PARSeq)模型的更有效的方法。 请查看我们的 [海报](https://drive.google.com/file/d/19luOT_RMqmafLMhKQQHBnHNXV7fOCRfw/view) 和 [PPT](https://drive.google.com/file/d/11VoZW4QC5tbMwVIjKB44447uTiuCJAAD/view) 以获取简要概述。 + +![PARSeq](.github/gh-teaser.png) + +**NOTE:** 更多信息请查看原项目 +## 环境安装 + +## 数据准备 +1. 按照文件树准备数据集 +```python +data +├── gt.txt +└── test + ├── word_1.png + ├── word_2.png + ├── word_3.png + └── ... +``` +gt.txt:数据集的标签文件,其中每行文本为:{图像路径}\t{标签}\n,例如 +```python +test/word_1.png 这里 +test/word_2.png 那里 +test/word_3.png 嘟嘟嘟 +... +``` +
+ +2. 调用脚本生成lmdb样式数据集 +```python +pip3 install fire +python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/ +... +``` +### Demo +An [interactive Gradio demo](https://huggingface.co/spaces/baudm/PARSeq-OCR) hosted at Hugging Face is available. The pretrained weights released here are used for the demo. + +### Installation +Requires Python >= 3.8 and PyTorch >= 1.10 (until 1.13). The default requirements files will install the latest versions of the dependencies (as of June 1, 2023). +```bash +# Use specific platform build. Other PyTorch 1.13 options: cu116, cu117, rocm5.2 +platform=cpu +# Generate requirements files for specified PyTorch platform +make torch-${platform} +# Install the project and core + train + test dependencies. Subsets: [train,test,bench,tune] +pip install -r requirements/core.${platform}.txt -e .[train,test] + ``` +#### Updating dependency version pins +```bash +pip install pip-tools +make clean-reqs reqs # Regenerate all the requirements files + ``` +### Datasets +Download the [datasets](Datasets.md) from the following links: +1. [LMDB archives](https://drive.google.com/drive/folders/1NYuoi7dfJVgo-zUJogh8UQZgIMpLviOE) for MJSynth, SynthText, IIIT5k, SVT, SVTP, IC13, IC15, CUTE80, ArT, RCTW17, ReCTS, LSVT, MLT19, COCO-Text, and Uber-Text. +2. [LMDB archives](https://drive.google.com/drive/folders/1D9z_YJVa6f-O0juni-yG5jcwnhvYw-qC) for TextOCR and OpenVINO. + +### Pretrained Models via Torch Hub +Available models are: `abinet`, `crnn`, `trba`, `vitstr`, `parseq_tiny`, and `parseq`. +```python +import torch +from PIL import Image +from strhub.data.module import SceneTextDataModule + +# Load model and image transforms +parseq = torch.hub.load('baudm/parseq', 'parseq', pretrained=True).eval() +img_transform = SceneTextDataModule.get_transform(parseq.hparams.img_size) + +img = Image.open('/path/to/image.png').convert('RGB') +# Preprocess. Model expects a batch of images with shape: (B, C, H, W) +img = img_transform(img).unsqueeze(0) + +logits = parseq(img) +logits.shape # torch.Size([1, 26, 95]), 94 characters + [EOS] symbol + +# Greedy decoding +pred = logits.softmax(-1) +label, confidence = parseq.tokenizer.decode(pred) +print('Decoded label = {}'.format(label[0])) +``` + +## Frequently Asked Questions +- How do I train on a new language? See Issues [#5](https://github.com/baudm/parseq/issues/5) and [#9](https://github.com/baudm/parseq/issues/9). +- Can you export to TorchScript or ONNX? Yes, see Issue [#12](https://github.com/baudm/parseq/issues/12#issuecomment-1267842315). +- How do I test on my own dataset? See Issue [#27](https://github.com/baudm/parseq/issues/27). +- How do I finetune and/or create a custom dataset? See Issue [#7](https://github.com/baudm/parseq/issues/7). +- What is `val_NED`? See Issue [#10](https://github.com/baudm/parseq/issues/10). + +## Training +The training script can train any supported model. You can override any configuration using the command line. Please refer to [Hydra](https://hydra.cc) docs for more info about the syntax. Use `./train.py --help` to see the default configuration. + +
Sample commands for different training configurations

+ +### Finetune using pretrained weights +```bash +./train.py pretrained=parseq-tiny # Not all experiments have pretrained weights +``` + +### Train a model variant/preconfigured experiment +The base model configurations are in `configs/model/`, while variations are stored in `configs/experiment/`. +```bash +./train.py +experiment=parseq-tiny # Some examples: abinet-sv, trbc +``` + +### Specify the character set for training +```bash +./train.py charset=94_full # Other options: 36_lowercase or 62_mixed-case. See configs/charset/ +``` + +### Specify the training dataset +```bash +./train.py dataset=real # Other option: synth. See configs/dataset/ +``` + +### Change general model training parameters +```bash +./train.py model.img_size=[32, 128] model.max_label_length=25 model.batch_size=384 +``` + +### Change data-related training parameters +```bash +./train.py data.root_dir=data data.num_workers=2 data.augment=true +``` + +### Change `pytorch_lightning.Trainer` parameters +```bash +./train.py trainer.max_epochs=20 trainer.accelerator=gpu trainer.devices=2 +``` +Note that you can pass any [Trainer parameter](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html), +you just need to prefix it with `+` if it is not originally specified in `configs/main.yaml`. + +### Resume training from checkpoint (experimental) +```bash +./train.py +experiment= ckpt_path=outputs///checkpoints/.ckpt +``` + +

+ +## Evaluation +The test script, ```test.py```, can be used to evaluate any model trained with this project. For more info, see ```./test.py --help```. + +PARSeq runtime parameters can be passed using the format `param:type=value`. For example, PARSeq NAR decoding can be invoked via `./test.py parseq.ckpt refine_iters:int=2 decode_ar:bool=false`. + +
Sample commands for reproducing results

+ +### Lowercase alphanumeric comparison on benchmark datasets (Table 6) +```bash +./test.py outputs///checkpoints/last.ckpt # or use the released weights: ./test.py pretrained=parseq +``` +**Sample output:** +| Dataset | # samples | Accuracy | 1 - NED | Confidence | Label Length | +|:---------:|----------:|---------:|--------:|-----------:|-------------:| +| IIIT5k | 3000 | 99.00 | 99.79 | 97.09 | 5.09 | +| SVT | 647 | 97.84 | 99.54 | 95.87 | 5.86 | +| IC13_1015 | 1015 | 98.13 | 99.43 | 97.19 | 5.31 | +| IC15_2077 | 2077 | 89.22 | 96.43 | 91.91 | 5.33 | +| SVTP | 645 | 96.90 | 99.36 | 94.37 | 5.86 | +| CUTE80 | 288 | 98.61 | 99.80 | 96.43 | 5.53 | +| **Combined** | **7672** | **95.95** | **98.78** | **95.34** | **5.33** | +-------------------------------------------------------------------------- + +### Benchmark using different evaluation character sets (Table 4) +```bash +./test.py outputs///checkpoints/last.ckpt # lowercase alphanumeric (36-character set) +./test.py outputs///checkpoints/last.ckpt --cased # mixed-case alphanumeric (62-character set) +./test.py outputs///checkpoints/last.ckpt --cased --punctuation # mixed-case alphanumeric + punctuation (94-character set) +``` + +### Lowercase alphanumeric comparison on more challenging datasets (Table 5) +```bash +./test.py outputs///checkpoints/last.ckpt --new +``` + +### Benchmark Model Compute Requirements (Figure 5) +```bash +./bench.py model=parseq model.decode_ar=false model.refine_iters=3 + +model(x) + Median: 14.87 ms + IQR: 0.33 ms (14.78 to 15.12) + 7 measurements, 10 runs per measurement, 1 thread +| module | #parameters | #flops | #activations | +|:----------------------|:--------------|:---------|:---------------| +| model | 23.833M | 3.255G | 8.214M | +| encoder | 21.381M | 2.88G | 7.127M | +| decoder | 2.368M | 0.371G | 1.078M | +| head | 36.575K | 3.794M | 9.88K | +| text_embed.embedding | 37.248K | 0 | 0 | +``` + +### Latency Measurements vs Output Label Length (Appendix I) +```bash +./bench.py model=parseq model.decode_ar=false model.refine_iters=3 +range=true +``` + +### Orientation robustness benchmark (Appendix J) +```bash +./test.py outputs///checkpoints/last.ckpt --cased --punctuation # no rotation +./test.py outputs///checkpoints/last.ckpt --cased --punctuation --rotation 90 +./test.py outputs///checkpoints/last.ckpt --cased --punctuation --rotation 180 +./test.py outputs///checkpoints/last.ckpt --cased --punctuation --rotation 270 +``` + +### Using trained models to read text from images (Appendix L) +```bash +./read.py outputs///checkpoints/last.ckpt --images demo_images/* # Or use ./read.py pretrained=parseq +Additional keyword arguments: {} +demo_images/art-01107.jpg: CHEWBACCA +demo_images/coco-1166773.jpg: Chevrol +demo_images/cute-184.jpg: SALMON +demo_images/ic13_word_256.png: Verbandsteffe +demo_images/ic15_word_26.png: Kaopa +demo_images/uber-27491.jpg: 3rdAve + +# use NAR decoding + 2 refinement iterations for PARSeq +./read.py pretrained=parseq refine_iters:int=2 decode_ar:bool=false --images demo_images/* +``` +

+ +## Tuning + +We use [Ray Tune](https://www.ray.io/ray-tune) for automated parameter tuning of the learning rate. See `./tune.py --help`. Extend `tune.py` to support tuning of other hyperparameters. +```bash +./tune.py tune.num_samples=20 # find optimum LR for PARSeq's default config using 20 trials +./tune.py +experiment=tune_abinet-lm # find the optimum learning rate for ABINet's language model +``` + +## Citation +```bibtex +@InProceedings{bautista2022parseq, + title={Scene Text Recognition with Permuted Autoregressive Sequence Models}, + author={Bautista, Darwin and Atienza, Rowel}, + booktitle={European Conference on Computer Vision}, + pages={178--196}, + month={10}, + year={2022}, + publisher={Springer Nature Switzerland}, + address={Cham}, + doi={10.1007/978-3-031-19815-1_11}, + url={https://doi.org/10.1007/978-3-031-19815-1_11} +} +``` diff --git a/configs/charset/simplified_chinese.yaml b/configs/charset/simplified_chinese.yaml new file mode 100644 index 00000000..e69de29b diff --git a/tools/create_charset_by_dataset.py b/tools/create_charset_by_dataset.py new file mode 100644 index 00000000..e69de29b